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In many scientific problems an essential step toward their solution is to accom¬ 
plish modeling and identification of some object or system under investigation. 
As defined here, system identification is the process of deriving a mathemati¬ 
cal system model from observed data in accordance with some predetermined 
criterion. The increasing expansion in the use of system identification is the 
result of demands imposed by advances in other scientific and technological 
areas such as biomedicine, physics, electrical engineering, and computer sci¬ 
ence. 

Modeling and identification as a methodology dates back to Galileo (1564- 
1642), who also is important as the founder of dynamics. Galileo was the 
first to establish the law of falling bodies, a law which states, i.e., that if a 
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body is falling freely in vacuum, its velocity increases at a constant rate. The 
key to Galileo’s success was his combination of theoretical and experimental 
work, with patience in observation and boldness in framing hypotheses. Un¬ 
fortunately, his ideas brought him into conflict with Aristotelian physics and 
the Church, and in 1633 Galileo was forced to abjure his “heresies” by the 
Inquisition, which was successful in putting an end to science in Italy. 

Hence, the important role of modeling and identification in science and tech¬ 
nology is to establish empirical relationships between observed variables. The 
standard view is that mathematical models are computational devices that 
should be distinguished from theories about physical structure. In a mature 
form, modeling can even be used in a theory when attempting to explain em¬ 
pirical laws by incorporating them into a deductive system. Although analo¬ 
gies might certainly be of great value to guide further research, it is important 
to state that an approach solely based on appeal to a model or an analogy is in¬ 
sufficient for the purposes of scientific explanation. For this reason modeling 
and identification are important in the early phases of scientific work where 
hypotheses are formulated and tested, refuted, or confirmed. It is hoped that 
the text will prove useful in such work. 

As modeling and identification are omitted or neglected in many graduate 
curricula, the knowledge aquired by students is largely confined to techniques 
whose applicability they cannot ascertain. This volume represents an attempt 
to remedy this by providing an integrated collection of laboratory experiments 
that illustrate the variety of situations to which quantitative identification and 
modeling methods may be applied. The book has grown out of lecture notes 
for a course on system identification held at the Lund Institute of Technology. 
The text is intended for a senior-level or graduate-level course in system iden¬ 
tification for students with some background in applied mathematics (control 
theory) and statistics (stochastic processes). As a basic course text, it is in¬ 
tended to furnish a broad perspective of this area of research and prepare the 
student for various forms of further study and research. In the book some sec¬ 
tions and exercises are marked * to indicate a more advanced or specialised 
subject matter. I have tried to establish some generalizations for this field, 
which to some extent is little more than an empirical art. Accordingly, I try 
to develop an appreciation of limitations and capabilities by discussing repre¬ 
sentative examples in major areas of application. 

A necessary complement to this book is some software for basic numerical 
computation such as Ctrl-C, Mathematica, Matlab, Matrix-X, X-math, etc. 
Implementation of identification algorithms without such support may easily 
lead to numerically inaccurate results. Basic prerequisites include numerical 
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implementations of linear algebra, and preferably some numerical optimiza¬ 
tion. Other valuable tools include software for the simulation of dynamical 
systems, such as the Omola, Simnon, or Simulink programs, and some graph¬ 
ics interface. Several software houses also offer supplementary software for 
solving problems of signal processing and identification. 

Am ong those who corrected errors and suggested improvements are: Leif An- 
dersson. Bo Bernhardsson, Ola Dahl, Per A. Fransson, Kjell Gustafsson, An¬ 
ders Hansson, Ulf Holmberg, Ulf Jonsson, Mats Lilja, and Henrik Olsson, to 
all of whom I owe sincere thanks. I would also like to thank Professor M&ns 
Magnusson and our staff at the Lund University Hospital, with whom I have 
collaborated in the practical application of identification. 

I would also like to thank my esteemed colleagues Leif Andersson, Karl J. 
Astrom, Per Hagander, Jan Holst, Ulla Holst, Tore Hagglund, Georg Lind- 
gren, Holger Rootz6n, and Bjorn Wittenmark at the Departments of Auto¬ 
matic Control and Mathematical Statistics at Lund Institute of Technology 
for creating the atmosphere in which I teach and work. I am also grateful 
for the research semester granted by the Lund Institute of Technology, which 
has been helpful during preparation of the manuscript. Compiling a text¬ 
book is naturally a time-consuming undertaking, and I would like to thank 
my family for generous support of an occasionally preoccupied family mem¬ 
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Introduction 


1.1 WHY MODELS? 

Decision making and problem solving are dependent upon access to adequate 
information about the problem to be solved. Often the available information is 
originally in the form of data or observations that require interpretation before 
farther analysis (and decisions) can be made. The derivation of a relevant 
system description from observed data is termed system identification , and 
the resultant system description a model . 
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Why are models needed? A general answer is that modeling and identification 
methods are needed for the interpretation of—often indirect-observations 
and measurements obtained from some system of stud}'. As models constitute 
the necessary link between experiments and decision making, modeling and 
identification are manifestly important for all applied science. 

A model represents essential aspects of a system with respect to certain pur¬ 
poses and may take on several different forms such as 

— Cognitive models (human concepts) 

- Normative models (purpose oriented) 

- Descriptive models (behavior oriented) 

— Functional models (action and control oriented) 

Cognitive models are the conceptual models underlying human reasoning and 
perception, inductive learning, decision making, and planning— i.e., human 
effort to understand and control the ambient world. Another category of mod¬ 
els is that of normative models, which define the specified or desired function, 
goal, or purpose of a system or process. Such models are often found in engi¬ 
neering design and government regulations. 

Other classes of models arise from the need for descriptive and functional mod¬ 
els for scientific and technological purposes. Such models are often subdivided 
into quantitative models (described by numbers or parameters) and qualita¬ 
tive models (described by categorical data). A necessary scientific background 
for the development of more precise normative models as well as of cognitive 
models includes an understanding of descriptive and functional quantitative 
models based on empirical data as used in science, technology, and economics. 
The value of quantitative models also derives from their ability to predict, 
and therefore to act upon, phenomena. For this reason, models for scientific 
and technological use are often quantitative models, and a central problem 
is to fit such models to data. From an empirical point of view, it is natural 
to start considering the collecting of input-output data from a system in op¬ 
eration where experiments are performed by manipulating the input. Such 
models serve to determine criteria of lawful change and thus fulfill at least 
three purposes: 

- Prediction: Given a description of the system over some period of time 
and the set of rules governing the change, predict the way the system 
will behave in future time. 

— Learning new rules: Given a description of the world at different times, 
produce a set of rules which accounts for regularities in the system. 
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Figure 1.1 
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An input-output relationship or a stimulus-response relationship. 


- Data compression: Produce a model that represents data on a compact 
form and with low complexity. 

Quantitative models may, of course, be formulated with different degrees of 
complexity, detail, and internal structure. A purely behavioristic model devoid 
of assumptions regarding internal structure is the black box model, which 
simply models a causal relationship between input and output ( cf. Fig. 1.1). 
The model may be static or dynamic, where a dynamical system is understood 
to mean one where output is determined not only by its input but also by 
some internal state that, in turn, may depend on previous input. If the state 
variables vary with time, then the dynamical system is said to undergo a 
process. 

Recourse to the lack of internal structure in the black box model may be mo¬ 
tivated by a desire to avoid irrelevant detail or simply by inability to connect 
into the system under consideration. Such approaches are common in control 
systems analysis and in biological and biomedical research. 


1.2 MODELING 

The search for relevant models may also start from another point. Provided 
that the structure or design of a system is largely known, it is often possi¬ 
ble to produce a block diagram or some network sketch of the system and its 
functional components. Such practice is called modeling, which usually pro¬ 
ceeds by starting from a set of (ideal) model components and gives rise to a 
physical model with some network structure. The resultant network behav¬ 
ior can be determined from the network structure and from the properties of 
the balance equations between all interacting components. The manner in 
which individual subsystems interconnect and act upon each other provides 
the overall system with an organizational pattern. 

The behavior of a system obtained from balance equations derived from physics 
may be described in detail by a set of algebraic and differential equations, 
which in turn may be solved analytically or by simulation. 
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Figure 1.2 Physical models of an electrical model and a mechanical model with 
components (capacitor, resistor, and a coil) and (mass, damping action, spring) with 
coefficients C,R,L and m,d,k. 


Example 1.1—Simple electrical and mechanical models 
Consider Fig. 1.2, which shows simple physical models, one an electrical 
circuit with a resistor, a capacitor, and an inductor, and the other a mechanical 
model with a spring, a damper, and mass. 

E(t) = R7 (t) + f I(t)dt + Kirchhoff voltage law 

C J dt (1.1) 

mx(t) = — di(<) — kx(<) + F(t) Newton’s second law 

It is often natural to distinguish between parameters and variables, where 
parameters denote the constant (or slowly varying) coefficients, e.g., R, C, 
L and m, d, k, and variables (or signals ) denote time-varying quantities, 
e.g., voltage E{t), current I(t), force F(t), velocity x(t), and position x(t). A 
parameter can also be regarded as a variable that has a constant value for a 
specific purpose or process. 

The causal relationship between different variables can be emphasized by 
adopting the terminology of input (e.g., voltage E(t) or force F(t)) for the 
forcing variable, and output (e.g., current I(t) or momentum p = mi) for the 
dependent variable. If we apply the Laplace transform L{ •} and try to solve for 
the relationship between input and output, then we have the transfer function 


Cr ~ ^ _ 

U ’ ~ L{E(t)} ~ s 2 LC + sRC + 1 

r (?) = = ms 

n) L{F(t)} ms 2 + ds + k 


( 1 . 2 ) 


The transfer functions obtained with physical parametrizations in terms of 
the parameters R, L, C (or m, d, k) are special cases of input-output mod¬ 
els. In general, the models thus obtained are called parametric models (or 
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parametrized models ) and represent a given structure where the parameters 
are sometimes not known and must be estimated. During modeling and ex¬ 
perimentation, the engineer or scientist drafts a block diagram or a sketch 
of the system and its functional components. Then he disturbs the system 
and traces response through the system. This procedure requires continuous 
measurement of responses as well as quantitative control over inputs. The 
subsequent mathematical problem consists in determining which differential 
equations govern the behavior of the system from which data have been ob¬ 
tained. This procedure differs from the task of solving differential equations, 
and the parameter estimation problem is therefore called an inverse problem. 
Inverse problems typically involve models of known physical structure and 
physical parameterization and interconnections but with unknown parame¬ 
ters. Such models are often designated grey box models and are commonly 
found in physics, technology, and control systems analysis. An example from 
physics is the inverse scattering problem. Here the scientific objective is to 
determine the physical structure that generates a given scattering pattern, 
which, of course, is valuable for the interpretation of data. 

Many modeling problems describe a condition of equilibrium associated with a 
minimum of energy. Such approaches often suffice to describe static properties 
and are natural first steps in many cases of modeling. A second step is to 
describe oscillations or transients around that equilibrium when the energy is 
conserved, released, or dissipated. Such approaches are useful for modeling 
resonances and vibrations in mechanical structures, monitoring variations 
in the neutron flow dynamics of nuclear reactors, or modeling vocal tract 
dynamics in speech processing. 

Modeling of oscillating resonant behavior is important in order to predict the 
stability, amplitude, and frequency of resonant or limit cycle oscillations. Com¬ 
monly used models are autoregressive linear models ( cf. Chapter 5), or non¬ 
linear models such as describing function analysis (cf. Chapter 12). More 
recently, the study of nonlinear oscillations has been focused to some extent 
on models of aperiodic oscillative behavior (“chaos”), for instance, in hydrody¬ 
namics. 

The nature of some sustained oscillating behavior is difficult to explain from 
conditions of equilibrium and conservation of energy. Consider, for instance, 
such periodic biological phenomena as circadian, monthly, and annual rhythms 
or periodic phenomena observed in glycolytic metabolism and respiratory me¬ 
chanics. Some of these systems exhibit limit cycle behavior where the system 
evolves toward a well-defined oscillation, no matter what initial conditions are 
imposed. 
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The modeling complexity required obviously depends upon the purpose of the 
modeling and identification. Modeling is in this respect an art based on the 
ability to visualize physical and other interconnections where all basic and 
applied knowledge contributes to modeling expertise. 


1.3 THE PURPOSE OF IDENTIFICATION 

As the modeling complexity required depends upon the purpose of the mod¬ 
eling and identification, it is desirable to distinguish a number of important 
application areas. For instance, it is necessary to stress the basic scientific 
need of quantitative models, which in turn presupposes an ability to predict 
new phenomena. Prediction or forecasting are areas closely related to mod¬ 
eling and identification, as the attempts to predict future states of a system 
are limited by the accuracy of the model used and the range of correlations 
of random processes affecting the system. In this context, it is important to 
represent external actions and external perturbations, extracting and using 
knowledge of statistical characteristics of random variables, as there is usually 
little theoretical or practical possibility of determining such characteristics in 
advance. 

Control systems analysis and design provide a rich field for the application 
of modeling and identification. A control mechanism is one that senses the 
control error, i.e., the difference between desired and actual states, and then 
initiates a series of processes and actions, which in turn produce counteractive 
effects to minimize the control error. This control principle introduces the 
important concept of feedback, which for linear systems can be illustrated in 
terms of a transfer function. This case also allows quantitative predictions 
to be made concerning crucial features of control systems such as stability 
conditions and the development of oscillatory behavior. 

Another established application area is signal processing for state estimation 
of variables not available to measurement. An example is how to estimate the 
velocity from data records of position in Example 1.1. This can be viewed as 
a type of indirect measurement where modeling and identification are neces¬ 
sary for correct calibration— e.g ., by comparison with some other standardized 
method. A calibration can often be reduced to a linear regression problem, 
which provides a linear fit between the outputs of two different measurement 
devices. 

Simulation based on mathematical models is widely used for the assessment 
of model complexity, for engineering design, or for operator training, all of 
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which require adequate modeling and adequate input. Examples from electric 
power engineering are abundant, and such simulation models often serve as 
the basis of monitoring or supervision, error detection, and process diagnosis 
in large systems. 

For their process economy and maintenance, industrial processes in continu¬ 
ous operation require system optimization, which in turn requires very accu¬ 
rate modeling. The result of optimization is often given as a function of sys¬ 
tem parameters, which are contingent upon reliable and accurate modeling 
and identification. A special case is autonomous systems automation (adap¬ 
tation), which for its implementation presupposes some learning capacity or 
parameter estimation. 


1.4 SYSTEM AND MODEL COMPLEXITY 

The modeling complexity needed depends on the purpose of the modeling and 
identification, and it is natural to distinguish among the following classes of 
models depending upon the detail and precision of modeling: 

Qualitative and categorical models (often used for fault detection and process 
diagnosis) are often easy to derive from physical principles. Although the basic 
idea of qualitative modeling is simple and straightforward, reasoning based on 
qualititative models may lead to ambiguous results. The models are defined 
by causal relationships or categorical data (e.g., models based on Boolean 
algebra, such as logical circuits), and they often serve as a complement or a 
user interface to more detailed quantitative models. 

A step toward quantitative models can be taken by adopting semi-quantita¬ 
tive models, where variables take on such categorical values as hot-normal- 
cold or high-normal-low. Such models are popular in statistics and in various 
branches of applied computer science. 

Static quantitative models are steady-state models described by algebraic 
equations involving such relationships as those between stresses and strains 
in mechanical structures or those among pressure, volume, and temperature 
in thermodynamics. Moreover, dynamic modeling expresses some propor¬ 
tional dependence, and it is natural to distinguish between the a priori models 
derived from physical principles, balance equations, and structural intercon¬ 
nections, and a posteriori models derived from experimental data. Both kinds 
of models may take on several different forms, although a priori models tend 
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to be organized as structured systems or networks with various subsystems, 
components, and internal structure, whereas a posteriori models tend to be 
formulated as behavioral models. As a posteriori models are derived from 
experimental data, they often use abstract or experiment-dependent param- 
eterizations such as black box models (Chapter 2), linear regression models 
(Chapter 5) or time series models (Chapter 6). Another difference is that 
the a priori models often manifest clear physical and causal relationships, 
whereas the data-oriented a posteriori models express relationships, such as 
covariance between variables formulated in statistical notions. 

Linear systems have a particular significance in this comparison because both 
modeling and identification often presume linear, proportional relationships 
to exist between variables. The powerful principle of superposition , then, 
simplifies the solution of the mathematical problems involved. In addition, the 
linear model is often useful as a small signal model around some equilibrium 
point. Linear systems comprise a number of highly different classes of models, 
and it is natural to distinguish, on the basis of their complexity, between 
certain alternative types of models: 

i. Composite models versus Submodels 

Obviously, this dichotomy is often a trivial one; for in composite models struc¬ 
ture is determined by the model’s components and detail is determined by 
the purpose of modeling. Methodologies to structure composite models from 
interconnected submodels or components are closely related to the analytic 
approach adopted in science and technology (see Chapter 7 for some exam¬ 
ples). 

Dynamical systems with outputs that change over time are often subdivided 
into 

it Time domain models versus Frequency domain models, 

where the Laplace transform, the Fourier transform, and the z-transform are 
the major analytic instruments used. 

An obiective of modeling is to reduce uncertainties about the behavior of a sys¬ 
tem, although the remaining sources of uncertainty also require some mod¬ 
eling. In all cases it is customary to use probability theory and stochastic 
processes to model uncertainty, interference of other subsystems, or input 
variablity according to some statistical distribution. Stochastic properties can 
be evaluated in the framework of linear systems with additive and multi¬ 
plicative characteristics, ensemble, and temporal statistics (Chapter 6). The 
differences in analytic approaches thus motivate a distinction between 
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iii. Deterministic versus Stochastic systems 

Subjective uncertainty (or ignorance) is often poorly modeled by stochastic 
processes and thus requires other modeling. Recently there has been an in¬ 
crease of interest in the domain of fuzzy set theory to model subjective uncer¬ 
tainty, although this trend has hitherto had little impact on system identifi¬ 
cation. 

Worst-case uncertainties and modeling of hostile interference or damage re¬ 
quire modeling by other means— e.g., differential games where uncertain in¬ 
terference is modeled and solved as an optimization problem. 

Again, on the basis of complexity, it is natural to distinguish between 

iv. Single-input single-output (SISO) versus Multi-input multi-output sys¬ 
tems (MIMO) 

This difference in modeling complexity reflects dependencies not only between 
input and output but also dependencies and interaction between the output 
variables. 

The mathematical methods of data analysis motivate the distinctions of 

v. Continuous-time versus Discrete-time models 

The continuous-time models are closer to physical considerations, whereas 
with discrete-time models system behavior is considered to be defined at a 
sequence of time-instants related to measurement. The discrete-time models 
are closely related to implementation problems of digital signal processing. 

It is possible to describe the model obtained with many very different param¬ 
eter sets, and it is sometimes a controversial and purpose-related problem 
to evaluate the significance of the parameters finally chosen to describe the 
model. For this reason it is standard to distinguish between 

vi. Parametric versus Non-parametric models 

Many models have been defined by a given form but are dependent on a 
finite number of real parameters. Such models are said to be parameterized 
or parametric models, although there is no clear-cut distinction from other 
models such as spectra or impulse responses, which are sometimes referred to 
as non-parametric models. (A strict use of “non-parametric” would, however, 
refer to situations and experiments in which the outcomes are assigned to 
categories rather than measured on numerical scales.) Other distinctions 
such as that between structured and unstructured models would appear to be 
no more successful. 
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Figure 1.3 Procedures of modeling and system identification. 


Methods of introducing state variables and associated parameters, to describe 
a certain external behavior and various equivalence classes thereof, are com¬ 
monly referred to as realization theory. As there are usually several possible 
models to describe a given external behavior, it is desirable to distinguish a 
minimal realization according to some complexity criterion. 


1.5 THE PROCEDURE OF IDENTIFICATION 

Identification has many aspects and phases, and it is customary to organize 
identification by considering a certain number of steps. We start with the 
object of identification designated 

S: A system to study 

It is also necessary to consider the application context— i.e ., whether the model 
derived is intended for scientific use or engineering design ( e.g ., simulation 
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models, fault detection models, control systems analysis, and/or human inter¬ 
vention) or for some other use. We designate this context 

T: Purpose and problem formulation 

The experimental conditions of experiment and the reliability of data must be 
reported in support of an estimated model. This is designated 

X: Experimental planning and operation to ensure that the prerequisites of 
the identification methods are used in the experimental procedure. 

Experimental design is often used to circumvent instrumental difficulties or 
to compensate for other inherent difficulties or interaction from other control 
loops. 

It is often natural to restrict the complexity of modeling to a certain model 
structure. The class of models thus adopted often belong to some standard 
category of models such as linear systems or ARMAX models associated with 
certain mathematical properties. Another standard approach often used is to 
make assumptions as to the physical nature of the system or other restrictions 
that define physical parameterizations. This we designate: 

OvC\ A model set 

The algorithms and the software used (including filtering, estimation meth¬ 
ods, numerical optimization) are often designated 

I. Identification and parameter estimation methods 

The quality and the limitations of an estimated model in the form of statistical 
tests, simulations, etc., should be stated in support of the model and are nec¬ 
essary in order to make modeling statements precise. Such tests are referred 
to as 

T 7 : Model validation 

The task of identification is to choose that element in a model class which 
explains a given set of observations and as little else as possible. According 
to Popper, this is called the most powerful unfalsified model. The contexts 
S,T,X,t\(, l, thus represent stages or aspects of interest during the identi¬ 
fication procedure, and Fig. 1.3 illustrates some of these relationships. The 
book is organized to cover these aspects beginning with the identification of 
simple linear models (Chapters 2-6) and proceeding with modeling issues 
(Chapters 7-10), followed by a review of topics in structured modeling and 
identification in Chapters 11-15, as outlined below. 

Chapter 2 contains a presentation of some standard-type identification meth¬ 
ods for black box models such as transient analysis with test signals in the 
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form of impulse and step signals. In addition, frequency response analysis 
with sinusoidal inputs and correlation analysis with white noise input are 
treated. 

Chapter 3 provides a systematic description of the use of Laplace transforms, 
the Fourier transform, and the z-transform in the context of signals and sys¬ 
tems. The effects of discretization and finite measurement time are also 
treated at this early stage, as they are fundamental properties in the con¬ 
text of measurements. Such spectra as autospectrum, cross spectrum, power 
spectrum, coherence spectrum and correlation, and covariance are defined, 
and their dependence on input-output properties is treated. 

Chapter 4 covers both spectral estimation techniques and some standard mod¬ 
ifications of these techniques intended to compensate for distortion due to dis¬ 
cretization and finite measurement times. In addition, covariance analysis 
and frequency response analysis are examined. 

Linear regression is explored in Chapter 5 with emphasis on the least-squares 
problem as an optimization problem and statistical properties of linear regres¬ 
sion estimates. The problems of bias in the context of least-squares identifica¬ 
tion are observed. Finally, the discrete Fourier transform as a- least-squares 
estimation problem is also treated. 

Identification of time-series models in the form of autoregressive models, mov¬ 
ing average models, and their generalizations are examined in Chapter 6. The 
problems of estimating a transfer function by means of time-series models in 
terms of prediction error problems or output error problems are examined. 
Lattice algorithms are also presented. Finally, some aspects of application 
such as bias reduction, assessment of periodic phenomena, and numerical 
optimization methods are discussed. 

Physical modeling principles for the determination of structured models such 
as mechanical models, thermodynamic models, and compartment models are 
presented in Chapter 7 with such common concepts as potentials, gradients, 
and flows. Identifiability problems of parameters in physical models are also 
discussed. 

Principles and methods of experimental procedure are to be found in Chapter 
8. Problems of experimental conditions with respect to the choice of input are 
discussed, and some special topics such as identification of systems in closed- 
loop operation are analyzed. The conclusion of Chapter 8 offers practical hints 
on the planning of experiments. 



Sec. 1.6 Historical remarks and bibliography 


13 


Model validation techniques are considered in Chapter 9, which covers the role 
of fulfilled method prerequisites, coherence test, model order determination, 
residual correlation tests, and other statistical tests. 

Model approximation and model reduction methods are considered in Chap¬ 
ter 10, with some emphasis on the balanced realizations applicable to linear 
systems. 

Real-time identification and recursive algorithms are considered in Chapter 11 
and provide a necessary basis for adaptive and learning systems that operate 
in real-time conditions. 

Continuous-time linear models are covered in Chapter 12, with further dis- 
cusssion of structural identifiability as motivated by physical modeling. 

Multidimensional modeling and identification methods are treated in Chapter 
13, including some methods for certain complex systems described by delay- 
differential systems or partial differential equations. 

Examples of nonlinear system modeling and identification are discussed in 
Chapter 14. 

Adaptive systems in Chapter 15 are related to identification in that they in¬ 
corporate real-time identification mechanisms permitting adaptatation to pa¬ 
rameter changes. Some important classes of adaptive systems are reviewed 
from the perspective of identification. 

The appendices comprise basic linear algebra, time-series analysis, statistical 
inference, numerical optimization, and a case study. 


1.6 HISTORICAL REMARKS AND BIBLIOGRAPHY 

As early as the fourth century before our present era, Aristotle recognized 
the importance of numerical and geometrical relations within science, and he 
singled out astronomy, optics, harmonics, and mechanics as sciences whose 
subject matter is mathematical relationships among physical objects. A few 
centuries later, Ptolemy (c.a.d. 85-165),who described planetary motion, em¬ 
phasized that more than one mathematical model can be constructed to de¬ 
scribe the astronomical observations. As two models may be mathematically 
equivalent, the scientist is at liberty to employ whichever model is the more 
convenient. 
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In the fourteenth century, William of Occam used simplicity as a criterion 
of concept formation and modeling. According to his view, it is desirable to 
eliminate superfluous concepts so that the simpler of two theories that account 
for a type of phenomenon is to be preferred. This methodological principle 
favoring low model complexity has often been referred to as “Occam’s razor.” 

Dynamic modeling and identification as a methodology dates back to Galileo 
(1564-1642), who also is important as the founder of dynamics. In addition, 
Galileo made contributions to the design and application of scientific instru¬ 
ments such as the telescope, the thermometer, and the clock. 

Hence, system identification and modeling have several different roots and 
belong with equal right to mathematics, statistics, computer science, control 
theory, systems analysis, and signal processing, with applications in technol¬ 
ogy, natural sciences, and econometrics. Recent years have witnessed con¬ 
tinuing parallel development in statistics, econometrics, speech processing, 
geophysics, and structural mechanics. 

The standard treatment of spectral analysis is to be found in the following 
reference: 

— G.M. Jenkins and D.G. Watts, Spectral Analysis and Its Applications. 
San Francisco: Holden-Day, 1968. 

System identification with some focus on time series analysis is presented in 

— G.E.P. Box and G.M. Jenkins. Time Series Analysis, Forecasting and 
Control. San Francisco: Holden-Day, 1970. 

— RE. Caines, Linear Stochastic Systems. New York: John Wiley, 1988. 

— P. Eykhoff, System Identification: Parameter and State Estimation. Lon¬ 
don: John Wiley, 1974. 

— G.C. Goodwin and R.L. Payne, Dynamic System Identification: Experi¬ 
ment Design and Data Analysis. New York: Academic Press, 1977. 

— L. Ljung, System Identification: Theory for the User. Englewood Cliffs: 
Prentice-Hall, 1987. 

— T. Soderstrom and P. Stoica. System Identification. London: Prentice- 
Hall Int., 1989. 

Although purposes of system modeling and identification are similar through¬ 
out the world, philosophical attitudes differ perceptibly. The American and 
British cultural spheres and Scandinavia, with their tradition of analytic 
and empiricist philosophy, sometimes tend to emphasize statistical aspects, 
whereas scientists in continental Europe with a strong background of idealis¬ 
tic philosophy more often emphasize modeling and approximation approaches. 
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Some philosophical background to scientific modeling is to be found in the fol¬ 
lowing works: 

- K.R. Popper, Conjectures and Refutations. London: Harper and Row, 
1963. 

- P. Feyerabend, Against Method, 2d ed. London: Verso, 1988. 

Some of the philosophy of KJR. Popper adapted to the context of identification 
is to be found in the following essays: 

- J.C. Willems, “From time series to linear systems,” Automatica. “Part 
I: Finite dimensional linear time invariant systems.” Vol. 22., 1986, pp. 
561-80, 1986. “Part II: Exact modelling.” Vol. 22, 1986, pp. 675-694. 
“Part III: Approximate modelling.” Vol. 23,1987, pp. 87—115, 

Suitable background material is provided in either of the three works below 

- K.J. Astrom and B. Wittenmark, Computer-Controlled Systems, 2d ed. 
Englewood Cliffs, NJ: Prentice-Hall, 1990. 

- R. Iseemann, Digital Control Systems, Vols. /-//. Berlin and Heidelberg: 
Springer-Verlag, 1991. 

- R.H. Middleton and G.C. Goodwin, Digital Control and Estimation: A 
Unified Approach. Englewood Cliffs, NJ: Prentice-Hall, 1990. 

Important scientific journals covering topics in identification include Advances 
in Applied Probability, Annals of Statistics, Automatica, Econometrica, Econo¬ 
metric Theory, IEEE Transactions Automatic Control, IEEE Transactions Sig¬ 
nal Processing, Journal of Econometrics, and Time Series Analysis. Further 
references are provided in each chapter. ■ 
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Nyquist diagram: Velocity 


Nyquist diagram: Position 



Black Box Models 


2.1 INTRODUCTION 

This chapter reviews some representative methods for simple analysis of dy¬ 
namical systems such as black box models fitted with data from transient 
response analysis. Frequency response analysis with sinusoidal inputs and 
correlation analysis with white noise input are also covered. 

The impulse response or the weighting function is fundamental in the descrip¬ 
tion of a linear system because it gives a complete characterization of of the 
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input-output map a linear time invariant system. Let the impulse response 
be designated g{t) and consider the integral 

y(t) = f g{r)u{t- x)dx (2.1) 

Jo 

The function g determines a causal relationship between input u and output 
y if g(t) = 0 for t < 0. Assuming the system to be at rest at time t = 0, it 
follows that application of a test signal u in the form of an impulse 

u{t) = 6{t) (2.2) 


gives the response 



g(r)S(t-T)dr = g(t) 


(2.3) 


which, in turn, justifies the use of the term impulse response. Moreover, if 
the input is chosen as u{t) = aui(t) + fJu 2 {t) for constants a, /?, we find that 

g(r)ui(t-r)dr+J3 f g(T)u 2 (t-r)dr (2.4) 
Jo 

from which it is clear that the system is linear in its response. The linear 
dynamic properties of (2.1-2.4) are also of interest for the purpose of approx¬ 
imating nonlinear systems with linear ones. Such approximations may be 
valid at least as small signal models. 


y(t) = [ g(t)u(t-T)dr = a f 
Jo Jo 


2.2 TRANSIENT RESPONSE ANALYSIS 

An identification methodology based on the application of an impulse on the 
input of a system is known as impulse response analysis. A similar method 
is to observe the transient output behavior produced by the system proceed¬ 
ing from a certain given initial condition ( initial condition response). Several 
other identification methods such as step response analysis are based on simi¬ 
lar special choices of inputs, and together constitute a class of methods known 
as transient response analysis. Such methods are often simple to apply and 
understand and often provide information good enough for estimates of input- 
output gain, dominating time constants, and time delays. These properties 
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Time [s] 

Figure 2.1 Similar impulse responses for two weighting functions with different 
step responses. Samples collected every second are denoted by and ‘o’. The ex¬ 
ample demonstrates the difficulty in obtaining correct low-frequency properties from 
impulse response tests. 

make the methods suitable for first-stage experiments to prepare for other 
experiments in system identification. 

In order to demonstrate properties of impulse response analysis we may begin 
with two examples: 

Example 2.1—Impulse response analysis 
Consider the two weighting functions 

^(0 = 1.05-0.41' 

g 2 (t) = 1.2 0.5' -0.2 0.75' 

which are shown in Fig. 2.1 together with samples collected every second and 
their corresponding step responses. The two data sequences obtained from the 
impulse responses are virtually indistinguishable. Nevertheless, they result 
in considerable differences in the corresponding step responses. This example 
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Figure 2.2 Similar impulse responses for two weighting functions of different ini¬ 
tial responses and samples of the impulse responses with period h - 1. The example 
demonstrates the difficulty in obtaining correct high-frequency properties from im¬ 
pulse response tests. In particular, sampled impulse responses exhibit this weakness. 


thus illustrates the problem of finding accurate estimates of the static (and 
low frequency) properties from an impulse response. ■ 

The next example demonstrates the difficulties in determination of high-freq¬ 
uency properties 

Example 2.2—Impulse response analysis 
Consider the two impulse responses 

gz{t) = 0.7' -0.1' 

£ 4 (0 = 0.8-0.73' ( 2 - 6 ) 

These signals and their corresponding step responses are shown in Fig. 2.2 
along with samples of the two impulse responses collected every second. No¬ 
tice that the two sequences of samples are almost indistinguishable despite 
the marked initial difference between the two impulse responses. An ade¬ 
quate assessment of the initial response would require much more frequent 
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sampling than was used in this example. Also, the synchronization between 
the generation of the input, i.e., the impulse, and the recording of the response 
requires attention for correct estimation of the initial properties. ■ 

As a conclusion it might be stated that it is difficult to handle both low- 
and high-frequency properties and synchronization effects in impulse response 
analysis even under noise-free conditions. Another problem is how to perform 
impulse response analysis in the presence of saturations and nonlinearities. 
In fact, the following practical problems of impulse response analysis can be 
listed: 

o Restriction to stable systems 
o Difficulties in generating impulses 

° Dynamics of sample and hold circuits 
° Synchronization between impulse and sampling 
o Difficulties for the system in managing inputs of large magnitude 
o Saturations 
o Nonlinearities 

o Difficulties in handling the “tails” of responses due to their long dura¬ 
tion and low amplitudes, with consequent problems of quantization and 
numerical accuracy 

o Sensitivity to noise 

Obviously, several of these problems are dependent upon experimental condi¬ 
tions. 

Step response analysis 

A complement to impulse reponse analysis is step response analysis which is 
often easier to perform than impulse response analysis. The test signal is 

(2.7) 

Under noise-free conditions, a step input to a system described by Eq. (2.1) 
generates the response 

y(t) = [ g(*)dr (2.8) 

Jo 

The step response test has an advantage in that it provides a good estimate 
of any static gain. Also, it is obviously possible to obtain an estimate of the 
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Figure 2.3 Frequency response analysis, 
weight function g(t) via differentiation of the step response 

£(*) = (2.9) 

which according to Eq. (2.8) is equal to g(t ). Such a calculation involves 
differentiation of the recorded output, where the accuracy of calculation may 
be modified or improved by bandpass filtering. 

The above examples illustrate some of the fundamental problems of system 
identification and such specific questions of how to choose experimental con¬ 
ditions (experiment duration, sampling frequency, test signals, data filtering, 
computation, etc.). Subsequently, it is necessary to select an appropriate level 
of model complexity. The answers to such questions are dependent on the 
purpose of system identification. 


FREQUENCY RESPONSE ANALYSIS 


The basis of frequency response analysis as an identification principle may 
be outlined as follows: Assuming the identification object to be a linear time 
invariant dynamic system, it can be described by some weighting function 
g{t). The Laplace transform JL[-} of the weighting function g(t) provides a 
transfer function G(s) = L,{g{t)) that relates the Laplace-transformed input 
U(s) = L{u{t)) to the output Y(s) = L{y(t)}. 


poo 

y(0 = / g(T)u(t-T)dr 
Jo 

Y(s) = G(s)U(s) 


( 2 . 10 ) 
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Assuming that we can ignore the transient from initial conditions and only 
consider the input-output response, the steady-state response of a stable sys¬ 
tem to a sinusoidal input 

u(t) - msincot (2.11) 

is then characterized by the gain \G(ico)\ and the phase shift </>(co) 

y(t) = \G{ia>)\uis\n{cot + <p((o)y, <t>{co) = argG(i<y) (2.12) 

Simple observation of a sinusoidal input plotted against the corresponding 
output response allows direct computation of the gain and phase shifts from 
Lissajou contours. Results are presented in the form of a Bode diagram where 
gain and phase shift are plotted against frequency in a specified range. How¬ 
ever, all Lissajou-type methods are sensitive to disturbances and transients, 
and do not yield any statistical averaging over the measurement interval. 

A better approach to reduce disturbance is the following averaging method 
(sometimes also called the correlation method of frequency response analysis; 
see Fig. 2.3): Let the output of the system subject to frequency analysis 
be multiplied with sinusoids of the frequency of the input with a subsequent 
integration during a specified measurement interval T. To minimize the effect 
of disturbance, the measurement duration T is always chosen as a number 
k of full periods of the periodic test signal u(t) = u\ sin cot, where T = k • 
(2 n/co). The outputs from these computations are obtained as follows: The 
“sine channel” provides 

r T 1 

st(co) = / y(t) sincotdt = -T\G(ico)\uy cos <p{co); T = kh 

Jo 2 

whereas the “cosine channel” provides the signal 

r T 1 

ct{(o) = / y(t)coscotdt = —T|G(t<y)|uisin0(<y) (2.14) 

Jo 2 

Estimates of the gain |G(i<y)| and the phase shift <p((o) are obtained as 

|G(i<y)| = ^-^/s|.(ty) + c|(©) and <p(a>) = arctan + kn (2.15) 

Hence, frequency response analysis can be viewed as an instrumentation of 
the Fourier transform (or as the computation of coefficients of a Fourier series 
expansion). The measurement intervals are usually specified in absolute time 
or as a number of periods of the frequency investigated. Another standard 



co 


(2.13) 
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Figure 2.4 Transfer function between input voltage u and the speed q (‘o’) and 
the position q (V) of a DC motor as obtained from frequency response analysis, lb 
avoid inaccuracy due to transients from initial conditions, each input of a new test 
frequency is started 5 s before measurement is started. 

feature is to avoid effects of non-zero initial conditions by introducing a time 
delay between the application of a new test frequency and the start of the 
measurement interval. 

Presentation of results 

Experimental results with estimates of |G(fry)|, <p(co) over some frequency 
range are generally presented graphically in the form of Bode, Nyquist , or 
Nichols diagrams . The Bode diagram contains the gain and phase response 
versus frequency, with the gain represented in a log-log-diagram and the phase 
delay in a log-lin-diagram, and is the standard representation of frequency 
response analysis (Fig. 2.4). 

The Nyquist diagram is a polar diagram containing the gain as the radial 
coordinate and the phase shift as the phase coordinate (Fig. 2.5). This repre- 
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Nyquist diagram: Velocity 



Nyquist diagram: Position 



Re G(iw) 


Figure 2.5 Nyquist diagrams representing the transfer function of a DC-motor 
velocity and position as obtained from frequency response analysis (c/. Fig. 2.4). *Ib 
avoid inaccuracy due to transients from initial conditions, each input of a new test 
frequency is started 5 s before measurement is started. 



sentation is specifically used in control systems analysis. 

2 


Re G(ico) 


;S T (CO) 


UlT' 

~ 2 

Im G(ico) = c T {0 )) 


(2.16) 


Example 2.3—Frequency response analysis of a DC motor 
Consider frequency response analysis of a DC motor from the input voltage 
u to the outputs measured as angular position q and angular velocity v = q 
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Figure 2.7 Nichols diagram of the transfer function between input voltage u and 
the speed q and the position q of a DC motor as obtained from frequency response 
analysis in Example 2.3. 

(Fig. 2.6) of the shaft. The Bode diagram of Fig. 2.4 displays the result of 
frequency response analysis of the DC-motor velocity over a frequency range 
0.01—40 Hz. 


To avoid inaccuracy due to transients from initial conditions, the DC motor 
input was exposed to each new test frequency for 5 s before measurement was 
started. The result is also presented in the form of a Nyquist diagram (Fig. 
2 -5). B 

The Nichols diagram (Fig. 2.7) depicts the frequency response as a diagram 
with the gain log |G(«y)| on the vertical axis versus the phase shift <p(a>) on the 
horizontal axis. The Nichols diagram contains level surfaces for |G/(1 + G)| 
and argG/(l + G), which is valuable in control systems analysis for quantifi¬ 
cation of stability margins. 
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Sensitivity to disturbances 

There are some obvious sources of disturbances: 

o Disturbances (constant; white noise) acting on the output 

o Disturbances (trends) acting on the output 

o Sampling interference 

° Unmodeled nonlinearities 

o The presence of higher-order harmonics 

Assume the output y to be corrupted by a disturbance v so that 

y(t) = |G(«y)|uisin(*yf + 0(<y)) + v(t ); <t>(a>) = arg G(i(o) (2.17) 


Sensitivity to this type of disturbance gives rise to errors of the form 

{ fp 

A st = f 0 sin (cot)v(t)dl 

(2.18) 

Act = fo cos(cot)v(t)dt 

The error due to a constant disturbance is thus zero. Another case of low dis¬ 
turbance sensitivity is white noise and bandwidth limited noise. The relative 
error of the transfer function estimate can be estimated as 


|AG(ica)| _ I co 
|G(£a>)| “ av \l 2kco c 


(2.19) 


where of is the variance of the output disturbance v, co c the bandwidth of the 
disturbance, and k the number of full periods of the sinusoid completed during 
the measurement time. A weak performance may be expected in the presence 
of ramp disturbances and periodic disturbances of the same frequency as the 
test frequency. Hence, the correlation method of frequency response analysis 
effectively eliminates the effect of constant disturbances but unfortunately 
does not effectively eliminate that of trend disturbances. To reduce estimation 
error due to trend disturbances, more complicated versions of the correlation 
are needed. 


Example 2.4—Sensitivity to sinusoidal disturbances 
Consider a first-order system 


Y(s) = G(s)U(s ) + V(s) (2.20) 

with output Y(s ), input U(s ), a disturbance V(s), and the transfer function 

1 

s + 1 


G(s) = 


( 2 . 21 ) 
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Figure 2.8 Frequency response analysis of a transfer function G(s) = 1/(5 + 1) 
when subject to a sinusoidal disturbance at 10 Hz = 62.8 [rad/s] with signal-to-noise 
ratio S/N = 1. Frequency axis in [rad/s]. 


The disturbance v(t) is assumed to be sinusoidal with the period 10 Hz and 
with the same magnitude as that of the input test frequencies u{t). The result 
of frequency response analysis is seen in Fig. 2.8 where both the experimental 
values and the theoretically calculated transfer functions are shown. The 
disturbance has strong impact on the experimental result at the frequency 
of the disturbance, i.e., at 10 Hz or 62.8 rad/s. The disturbance is, however, 
noticeable over a large frequency interval and corrupts both gain and phase 
measurements. A longer measurement time T tends to reduce the corrupted 
frequency interval and result in a narrow-band error. 


Example 2.5—Sampling interference in frequency response analysis 
It is, of course, often natural to implement frequency response analysis by 
discrete-time methods where the sinusoidal inputs and outputs are sampled. 
In Fig. 2.9 is shown the result of frequency response analysis applied to 
identification of the transfer function G(s) = l/(s + 1) sampled at a sampling 
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Frequency [Hz] 



Figure 2.9 Frequency response analysis of a transfer function G(s) = l/(s + 1) 
when implemented by discrete-time methods at 10 Hz sampling rate (solid line ) as 
compared to true transfer function (dotted line). Notice that the estimate is poor for 
large frequency due to sampling interference. Frequency axis in Hz. 

frequency of 10 Hz. It is clear that there is a significant distortion of the 
transfer function estimate in the frequency range 1—10 Hz. This style of 
implementation thus requires a sufficiently high sampling frequency, vis-a-vis 
the frequency range to be investigated, in order to avoid detrimental sampling 
interference. ■ 

Frequency response analysis is applicable to linear time-invariant systems but 
may have difficulty in providing meaningful results when applied to nonlinear 
systems. A difficulty associated with nonlinearities is the ambiguity problem 
of jump resonances, which may appear in certain saturation nonlinearities. 
Systems with jump resonances may give irreproducible and different results 
depending upon the sequence of frequency response tests, i.e ., results obtained 
when the experiments are performed with increasing test frequency may differ 
from those obtained with decreasing test frequency. Another problem is the 
dependence of the frequency response analysis on the input magnitude, an 
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issue discussed in greater detail in conjunction with the describing function 
method (see Chapter 12 on model approximation and model reduction). 

Problems of another type arise in applications to time-varying systems where 
distorting interference may appear. In particular, all frequency response anal¬ 
yses applied to periodically sampled and controlled systems are error-prone 
due to interference between the test frequency used and the frequency of peri¬ 
odic time-variant behavior. For instance, periodic systems such as combustion 
engines, thyristor controlled systems, and other pulse-width modulated sys¬ 
tems may suffer from this type of problem. Synchronization of the periodic 
system and the phase of the frequency response test signal often provides the 
means of solving these problems in the experimental procedure. 


2.4 APPLICATION OF FREQUENCY RESPONSE ANALYSIS 

The application of sinusoidal inputs to a dynamic system with recording of 
gain and phase lag is widely used in such contexts as electronic circuits, elec¬ 
tromechanical devices, acoustics, audiometry, and data communication. This 
methodology instruments the Fourier transform and is the principle of many 
commercial transfer function analyzers. 

The experimental procedure must fulfill the methodological prerequisites, and 
it is essential to ensure that the system is linear in response to the input 
magnitude used. 

Another concern is that frequency response analysis of devices that contain 
friction and similar nonlinearities may produce poor results for tests with zero 
mean velocity. A possible alternative is to choose an input signal 

u(t) = uo + Ui sin(n>t) (2.22) 

in order to avoid some of these problems. 

Transients from non-zero initial conditions may result in problems of accuracy. 
As the method is dependent upon steady-state conditions, the introduction 
of a delay between the application of a new test frequency and the start of 
the measurement interval is necessary to minimize the detrimental effects of 
initial transients. 

The method normally yields the gain without ambiguity. The phase delay is 
obtained uniquely except for a multiple of 2k [rad] (or 360°). This ambiguity 
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may be resolved by choosing a certain experimental procedure: For low-pass 
type transfer functions it may be known that the phase delay is, say, zero for 
the low frequencies. Let the frequency analysis be made for frequency points 
in increasing order with the phase delay at zero for the low frequencies. The 
phase lag for higher frequencies may then be obtained without ambiguity. 

There are several advantages in using this procedure: 

° A Bode diagram is obtained directly from data. 

° Phase lag and delay time are easy to measure. 

o Improved accuracy can be obtained if the measurement time is increased, 
o Good results may be expected, even with poor signal-to-noise ratios 
o Averaging of nonlinearities provides approximate linear models. 

However, a number of restrictions can also be noticed, such as 
o A special test signal is needed (only sinusoidal inputs), 
o One experiment is needed for each test frequency. 

° Long measurement time is required. 

o To avoid biased results, the effects of initial conditions must be allowed 
to disappear before measurements can be made. 

° Frequency sweeping may not be possible, 
o Experiments are only possible on stable systems. 

o Frequency response analysis presupposes application to time-invariant 
linear systems. 

Obviously these restrictions determine the range of possible applications of 
frequency response analysis. 


2.5 SUMMARY 

The examples given in this chapter illustrate some common approaches to 
system identification in both the time domain and frequency domain. Given 
an object to identify, all approaches are predicated upon certain choices of 
experiment durations, test signals, and sampling frequency. The identification 
methods have been used to fit a model in some model set— e.g., a weighting 
function, a transfer function, or some parametric model. 

As in all experimental work it is necessary to consider carefully the repro¬ 
ducibility of results, the statistical properties of data, and the possibility of 
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inaccurate measurements. These concerns affect the choice of experimental 
conditions and identification method and will be discussed in greater detail in 
the following chapters. 


2.6 EXERCISES 


2.1 Consider the single-input single-output state space system 

x(t) = Ax(t) + Bu(t), x € R n 
y(t) = Cx(t) 


Show that the impulse response g(t) = Ce At B. What is the effect on 
the measured impulse response if there is a non-zero initial condition 
x(0) = x 0 ? 

2.2 Assume that the ideal impulse cannot be implemented, and that it is 
replaced by the input 


u 



1/T, 

0 , 


0 < t < T 
t > T 


(2.24) 


How should the impulse response be determined from data obtained when 
using the input (2.24)? 

2.3 Consider the setup for frequency analysis in Fig. 2.10. 



Figure 2.10 Noise-corrupted frequency response analysis. 

Assume that the disturbance v affects the output of the system so that 
the resultant identification will be compromised. Also assume that the 
measurement time is n periods for each test signal with frequency coi. 
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a. How will the Bode plot be affected if the disturbance is a sinusoid of 
frequency co v , i.e., v(t) = A v sin (©,,£)? 

b. Assume that frequency response analysis has been performed with the 
measurement duration T and assume that v is high bandwidth noise 
with mean 0 and variance cr 2 . How much is it necessary to increase the 
measurement duration in order to reduce the variance of G by a factor of 
two? 

c. Assume that the frequency analysis and the Bode plots will be used as a 
basis for regulator design. Discuss how to choose a finite measurement 
time at each frequency to obtain optimal closed-loop control. Is it possible 
to choose a measurement strategy for this purpose? 

2.4 The frequency response method allows estimation of the transfer function 
G 0 (io). The estimate may be written 

G(ito) = Go{ia>) + AG(ico) 

where Go(ico) is the “true” transfer function and A G{i(o) is the contri¬ 
bution from measurement noise. Assume that the measurement noise is 
“white” with mean value 0 and variance a 2 . Then it can be shown that 

t E{AG(io))} = 0, Var (AG(t<y)} = 

Moreover, argAG(ico) has a square distribution in the interval [0,2;r). 

The transfer function was estimated for 40 different frequencies. The 
result, G(ico) f is shown in Fig. 2.11. The correlation time T was chosen 
as 50 s and the noise variance cr 2 was estimated to be 0.1 s. 

The system will be controlled by a proportional regulator with a gain of 1. 
In order to decide the closed-loop system stability properties, it is helpful 
to ascertain whether the Nyquist contour encloses the point —1. Is it 
possible to extract this information from Fig. 2.11? 

2.5 Assume that there is a nonlinearity at the output of the analyzed system 
(see Fig. 2.12). Then the response to a sinusoidal input will also contain 
frequencies other than the input. What is the impact on the Bode plot? 
What does this mean for the choice of input signal? 
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Figure 2.11 The transfer function for G(ieo) in Exercise 2.4 measured at 40 differ- 
ent frequencies. 



Figure 2.12 Frequency response analysis with nonlinearity. 

2.6 Discuss the requirements on sampling frequency of a digital implemen¬ 
tation of frequency analysis for a given desired bandwidth. 

2.7 Shew that it is possible to perform frequency response analysis according 

to Eq. (2.13-2.14), i.e., the correlation method, with a selected measure¬ 
ment interval of T = ( n/co)k . In addition, show that it is possible to 
eliminate constant perturbations in the output by choosing T - ( 2n/(o)k 
according to (2.13). * 



































I Signals and Systems 


3.1 INTRODUCTION 

This chapter provides a brief systematic description of the use of the Laplace 
transforms, the Fourier transform, and the z-transform in the context of sig¬ 
nals and systems. Also, effects of discretization and finite measurement time 
are treated at this early stage, as they are fundamental properties in the con¬ 
text of measurements. Definitions are given of such spectra as autospectra, 
cross spectra, power spectra, correlation, covariance, and coherence spectra, 
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and their dependence on input-output properties is treated. Spectrum analy¬ 
sis and other signal processing based on these spectra follow in Chapter 4. 


TIME-DOMAIN AND FREQUENCY-DOMAIN TRANSFORMS 

Consider a function x(t) which represents a measured or observed variable 
and which is assumed to be a function defined for all time t. Let the Laplace 
transform be defined as 


X(s) = L{x(t)} = f x(t)e «dt 

« / -°° ( 3 . 1 ) 

-f A0-+IOC 

x(t) = £- I (X( S )) = ijT,_ X(s)e“ds 

where the argument s = a + ico is called complex frequency. Let the spectrum 
be defined as the Fourier transform of x(t) 

X(ico) = jF {*(*)} = f x(t)e- icot dt 

J -°° (3.2) 

1 / %+ ° C 

x(t) = T- l {X{ico)} = ^ J X{i(o)e ia>t d(0 

Clearly, the Fourier transform and the Laplace transform coincide for the 
choice s = ico in cases when the Fourier integral (3.2) exists. The Fourier 
transform provides a result in the form of a function X(io)) with the argument 
co interpreted as angular frequency [i rad/s]. The functions x(t) and X(s) 
are called time-domain and frequency-domain representations, respectively. 
From the definitions (3.1) and (3.2), we notice that the Fourier transform 
and Laplace transforms exist if the integrals take finite values (i.e., remain 
bounded). A major difference between the Laplace and Fourier transforms 
is that the Laplace transform is valuable for analysis of transient behavior, 
whereas the Founer transform is mainly applicable to periodic signals. 

Most problems can be formulated in a manner which permits all signals to 
be zero for t < 0, and it is customary to restrict the Laplace transform to the 
one-sided Laplace transform 
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Figure 3.1 Discretized data x A obtained from periodic sampling of a continuous¬ 
time variable x(t). 

This is identical to the two-sided Laplace transform if and only if x(t) = 0 
for t < 0. An important property is uniqueness, i.e., if f[t ) and g{t) both 
have the same Laplace transform, then f{t) and g{t) can differ only at a 
countable number of points. It is also straightforward to show that the Laplace 
transformation is a linear operation, i.e., 

L{axi(t) + 6x 2 (f)} = aL{xi(f)} + 6L{x 2 (f)} (3.4) 

for arbitrary numbers a and 6, 


3.3 DISCRETIZED DATA 

A measured variable x(t) is in many cases available only as periodic observa¬ 
tions of x{t) sampled with a time interval h (the sampling period). Let the 
sampled values of x(t) be represented by the sequence 

{**}-«,! x k = x{kh) for k = ...,-1,0,1,2,... (3.5) 

For ideal sampling, it is required that the duration of each sampling be very 
short and the sampled function may be represented by a sequence of infinitely 
short impulses; see Fig. 3.1. I-et the sampled function of time be expressed 
thus 

OO 

x A (t) = x{t) ■ h ~ kh) = x{t ) • ULI/j ( t) (3.6) 

k — — OO 

where 

OO 

114 (t)=h S(t-kh) 

k — — OO 


(3.7) 
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and where the sampling period h is multiplied to assure that the averages 
over a sampling period of the original variable x and the sampled signal x A , 
respectively, are of the same magnitude. 

Obviously, the original variable x(t) and the sampled data are not identical, 
and thus it is necessary to consider the distortive effects of discretization. 
Consider the spectrum of the sampled signal x A (t) obtained as the Fourier 
transform 

X A (ico) = ?{x A (t)) = ‘J{x{t)} * T{ LU/, (0} (3.8) 

where 

oo Oh 

J{LU* (0} = J2 6{a>- -r-k) = — L±W<u) (3.9) 

z — 4 h A7t 

k- — oo 

so that 

OO Q 

X,{m) = 7[x{t)} LU A (0} = E x(i{co --X-k)) (3.10) 

£ = —oo 

The Fourier transform X A of the sampled variable is thus a periodic function 
of the original spectrum X(ico) along the frequency axis with a period equal 
to the sampling frequency o) s = 2 n/h; see Fig. 3.2. 


Theorem 3.1—The sampling theorem (SHANNON) 

The continuous-lime variable x(r) may be reconstructed from the samples 
{xfc} if and only if the sampling frequency is at least twice that of the 
highest frequency for which X(ico) is non-zero. 

Proof: Let iV' a (a)) denote the spectral window 


1, \co\ < a 
0, j<y| > a 


(3.11) 


W a (co) = 
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Subject to the assumption that |X(to)| = 0 for |tw| > (o s /2 = n/h, it holds that 


X & (ico) • W s/h (a>) = X{ia) (3.12) 

where the original spectrum X(ico) thus may be recovered. The original vari¬ 
able x(t) may thus be recovered from (3.12) as 

x(t) = T~'{X^i(o) • W a , h (co )} 


, .. ,, ... ,v , lsin nt/h. 

= f {X A (i<y)} * J *{ W/r/A(®)j = x *(t) * (jf t x t/h ' (3.13) 


OO 

- E* 

k — —OO 


sin ^ (t — kh) 
lit-kh) 


The formula (3.13) is called Shannon interpolation and is valid only for in¬ 
finitely long data sequences. Notice also that it would require a noncausal 
filter to reconstruct the continuous-time signal x(t) in real-time operation. 
The frequency o) n = co s /2 = nth is called the Nyquist frequency and indi¬ 
cates the upper limit of distortion-free sampling. Failure to respect this limit 
leads to interference between the sampling frequency and the sampled signal 
(aliasing); see Fig. 3.3. 


3.4 THE Z-TRANSFORM 

The z-transform of a signal x(t) discretized as the sequence { xf\ is defined 

OO 

X z (z) = Z{x) = £ XkZ ~ k 

k — —OO 


as 


(3.14) 
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Xk ~ 2 1< ^ Z (3.15) 

The z-transform is an infinite power series in the complex variable z _1 where 
{xk} constitutes a sequence of coefficents. As the z-transform is an infinite 
power series, it exists only for those values of z for which this series converges. 
A sufficient condition for existence of the z-transform is convergence of the 
power series 

y, |x*| • |z"*| < oo (3.16) 

ife = —OO 

The region of convergence for a finite-duration signal is the entire z-plane 
except z = 0 and z = oo. For a one-sided infinite-duration signal a 

number r can usually be found so that the power series converges for \z\ > r. 

A direct application of the discretized variable x&(t) in (3.6) verifies that the 
spectrum of x& is related to the z-transform X z (z) as 

X & (ico) = F{x(t) • LU A (£)} = h jh Xk exp(-icokh) - hX z {e M ) (3.17) 

k=—OC 


3.5 FINITE MEASUREMENT TIME 


Sampling with interpolation according to the Shannon theorem presupposes 
assumptions on infinitely long data records, which is clearly not possible in 
the context of identification. In order to represent finite measurement time 
T = Nh with all measurements starting at time t = 0 and ending after N 
measurements, we introduce the functions 


r0 ; t < 0 

n T (t) = ( 1, 0 < t < T 
10, t > T 


fl owl = 2 r sin( “’^' /2 ^ - W!! (3.i8) 


and in discrete time 

0, k < 0 i_ Z -N 

l, o < < iV-1 z{riiv} = 1 _ _f 

0, k > N 


n K (k) = 


(3.19) 



40 


Chap. 3 Signals and systems 
Table 3.1 Properties of the Fourier transform 


Time function 

_ x 

Fourier transform 

Linearity f(t - r) 

x—" 

F(ico)e~ io> * 

f(at) 


W\ F(i 

af(t) + bg(t) 


aT[f) +bf{g} 

Plancherel theorem f * g 


Hf) -7{g} 

m-g(t) 


7{f]*7{g) 

Dirac impulse £(£) 


1 

S(t-T) 


e -io>x 

Poisson’s formula 



4-00 

i n h (t) = h^Ht-kh) 

- 

= Y^sico-^-k) 

^ v — oo 

(1, \t\<T 
Wr(0 "\0. \t\>T 


rr ( , 0 rrSincoT 

f{w T ) =2 T aT 

_ ,.s / 1, 0 < t < T 

otherwise 


n^r\ = 2T sM ^M e - m * 


The spectrum of any signal is distorted by finite measurement time, as for 
any variable x(t) measured during a finite measurement interval T it holds 
that the spectrum is 

r{x(t)-n T (t)} =f{x(t)} *!F[n T (t)} (3.20) 

All spectrum estimation based on finite-time measurement is thus distorted 
by the convolution with ^{Ut} ; see Fig. 3.4. The original spectrum is thus 
dispersed over large frequency ranges and provides a non-zero spectrum be¬ 
yond all choices of the Nyquist frequency. Hence, the conditions of Shannon’s 
sampling theorem are clearly not satisfied. A rigorous attitude toward sam¬ 
pling is, therefore, that a variable cannot be sampled in a finite measurement 
interval without spectral distortion arising. 
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Table 3.2 Properties of the Laplace transform 


Time function 


Laplace transform 

Linearity af(t) + bg{t) 


aL[f] + bL[g] 

, . df 

Time derivative -f- 

dt 


sF(s)-f( 0) 

Time translation f{t — r) 


F(s)e~ ST 

Multiplication f(t)e~ at 


F(s + a) 

Multiplication by t tf(t) 


dF(s ) 
ds 


Table 3.3 Properties of the z-transform 


Convolution 

Time translation 
Linearity 
Multiplication 
Final value 

Initial value 


Z{f*g} =Z{f} • Z[g } 
Z{f-g) = Z{ f) *Z{g} 

Z{ f((k — d)h)} = z~ d Z{ f{kh)} 
Z{af + bg } = aZ{ f] + bZ{g] 
Z{a k f{k)} = F z (a~ l z) 

f(oo) = lim(l - z~ 1 )F(z) 

Z-* 1 

f( 0) = lim F z (z) 


Example 3.1—Spectral effects of finite measurement time 
Consider a signal 

x(t) = sin&t ■ flr(t) (3-21) 

consisting of a sinusoid that is truncated after some time T. The Fourier 
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Figure 3.4 The spectral properties of f[n r (<)) shown for a measurement du¬ 
ration T = 10. The zero crossings appear at frequencies that are multiples of 
1/T. The lower graph shows a discrete-time counterpart for T = Nh = 10 for 
N = 10,100,1000, respectively. The maximum magnitude of all graphs has been 
normalized to one. 


transform of the signal is 

X(ico) = (f{sincot - flr(f)} = lF{sin<y£} *lF{nr(0) (3.22) 

where ;F{n r (f)} is shown in Fig. 3.4. A remarkable conclusion is that the 
spectrum of the pure sinusoid does not appear within any finite measurement 
time. Thus, the effect of finite measurement time on the spectrum is consider¬ 
able and the resultant distortion of the spectrum (the spectral leakage) cannot 
be ignored. ~ 

The Discrete Fourier transform 

Consider a finite length sequence {%=o that is zero outside the interval 
0 < k < N - 1. Evaluation of the z-transform X z {z) at N equally spaced points 
on the unit circle z = exp (ia)kh) = exp(i(2n:/Nh)kh) for k = 0,1,..., AT — 1 
defines the discrete Fourier transform (DFT) of a signal x with a sampling 
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U(s)- 


G(s) 


-Y(s) 


U(z) 


H(z) 


-Y(z) 


Figure 3.5 Input-output models with transfer functions H{z) and G(s). 
period h and N measurements 


N -1 

Xk = TnhjN){x{kh)) =hJ2 x e exp(-io) k eh) = hX z (e i0 * h ) (3.23) 

t=o 


Notice that the discrete Fourier transform (X*.) is only defined at the 
discrete frequency points 

(O k = 2grk, for k = 0,1. N-l (3.24) 

Nh 

For a finite-time measurement (N samples with a sampling period h) of a vari¬ 
able x, the relationship between the Fourier transforms ? of the continuous¬ 
time variables, the discrete Fourier transform and the z-transform are as 
follows: 

Xk = (h,N)[x(kh)} = !F{x{t) • LU/, (0 • n wft (f)} 

= hZ[x(kh) ■ n w (fc)} \ z =ex P (ico k h) (3.25) 

= hZ(xj * Z{ ri/v (/j)) \z=exp(i(Okh) 

which follows from (3.6) and (3.19). We conclude that 

I _ p-i(0 k hN 

Xk = Uh.N){x{kh)) = X A (ico k h) * (3-26) 

In fact, the discrete Fourier transform adapts the Fourier transform and the 
z-transform to the practical requirements of finite measurements. 


3.6 THE TRANSFER FUNCTION 


An alternative to the input-output representations is the state-space repre¬ 
sentation. Consider the following system with input u (stimulus) and output 
y (response). The dependency of the output of a linear system is characterized 
by the convolution equation 



g(r)u(t - r)dr + v(t) 


(3.27) 
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where v(t) is some external input that represents errors and disturbances and 
g(t) is the impulse response or weighting function. Application of the Laplace 
transform to Eq. (3.27) gives 

Y(s) = G(s)U(s) + V(s) (3.28) 

and provides the frequency domain input-output relation with G(s) = L[g] as 
the transfer function. A similar relationship holds for discretized input-output 
data. Consider the model 


y k = + Vk = Y hk-eUj 4- v k , k = ...,-1,0,1,2,... (3.29) 

/=0 /= —oo 

with the pulse response h(kh) = { hf\ JLq and its z-transform, the pulse transfer 
function 

oo 

H(z ) = Z[k(kh)} = y; hkZ~ k (3.30) 

k=0 


The pulse transfer function H{z) is obtained as the ratio 

Y z {z) 


H(z) = 


U z (z) 


(3.31) 


when the disturbance V is zero. 


State-space systems 

Alternatives to the input-output representations by means of transfer func¬ 
tions are the state-space representations. Consider the following finite di¬ 
mensional discrete state-space equation with a state vector Xk € R n , input 
IT* € R p , and observations y n e R m . 

fx k+ i = ^x k + ru k k=z0 
\ yk = Cx k + Du k 

with the pulse transfei function 

H(z ) = C(zl - O)"^ + D 


(3.32) 


(3.33) 


and the output variable 


OO 

Y(z) = CX>V** 0 + H(z)U(z) 


(3.34) 
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where possible effects of initial conditions xo appear as the first term. Notice 
that the initial conditions xo can be viewed as the net effects of the input in 
the time interval (- 00 ,0) which can be verified by comparison with Eq. (3.29). 


3.7 SIGNAL POWER AND ENERGY 


Assuming that for a signal x(t) at a fixed time t the instantaneous power at 
time t is 

Pxx(t) = x(t)-x'(t) (3.35) 

where the asterisk denotes complex conjugate and transpose. For example, for 
a scalar signal x(t) = a + ib the instantaneous power is p xx (t) = x(t) • x’(t) = 
a 2 + b 2 . The instantaneous power of the interaction between two signals x 
and y is 

Pxy(t) = x{t ) • y*(t) = p’ yx {t) (3.36) 

The average power over an interval [£ 0 , to + 7’] is 

1 rto+T 

PxAt 0 ,T) ~ x(t)x'(t)dt 

Jto (3.37) 


The energy of a signal is the integral of the power over time 


1 rto+T 

Px y (to, T) = — x(t)y'(t)dt 


e xx = f x(t)x"{t)dt 


(3.38) 


and the interaction energy between two signals x and y is defined as 


/ + OO 

x(t)y'(t - r)dt 

OO 

**(*■) = / y{t)x'{t - T)dt = e* xy (-T) 

J —OO 


(3.39) 


The signals x and y are said to be uncorrelated if the interaction energy 
e xy {t) = 0. 


Remark: The energy definitions (3.35-3.39) should not be confused with the 
energy definitions used in physics. 
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3.8 SPECTRA AND COVARIANCE FUNCTIONS 


The spectral density of energy or the energy spectrum is defined as 

E xx (i(o) = X{i(o)X'{i(o) (3-40) 

whereas the cross energy spectrum between two signals x and y with Fourier 
transforms X and Y is defined as 

E xy {ia>) = X(ico)Y'(ia>) (3-41) 

According to the Parseval relations it is known that the total signal energy 

/•+oo i r+oo 

j x{t)y\t)dt = ±J X(ico)Y*{ia>)d(o (3.42) 

which verifies that the signal energy is independent of the choice of repre¬ 
sentation in time or frequency. According to the Plancherel theorem, the 
product of two Fourier transforms equals the Fourier transform of the convo¬ 
lution of the two time-domain signals; thus it follows that the energy cross 
spectrum E zy {ico) is 


E xy (ico) = X{i(o)Y\i(o) = T[x)-T{y') 

/ +OG 

x{t)y‘{t -r)dt) = f{e xy {z)) 

•OO 


(3.43) 


The relationship (3.43) is known as the Wiener-Khintchine theorem. For sig¬ 
nals with infinite energy it makes better sense to consider the cross covariance 


C xy (r) = lim [ x(t)y*(t - r)dt (3.44) 

T-+oo ZI J-T 

and the power cross spectrum (or cross spectrum) between the signals x and y. 
The power cross spectrum S xy and the cross-covariance function are related 

aS 

S xy (ico) = f{C xy (T)} (3.4o) 

Similar relationships exist between the autospectrum and the autocovariance 
function as 

T 

S xx {ico) = HC xx {x)) J^x(t)x*(t-z)dt) 


(3.46) 
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Table 3.4 Some properties of spectra S and covariances C obtained from a linear 
process y(t) = g(t) * u(t) + v(t) 


Cross-covariance function 

C,„(r) = 1 j_ T y(t)u-(t-l)dt 

C u ,(r) = Hm i j ^ u(t)y-(t - r)dt 

Autocovariance function 

C uu (t) = Km ± f* u(t)u'(t - x)dt 

Cyu(T) = £-(r)* Cuti (*). 

Autospectrum 

S uu {ico)=T{C 

uu (*)} 

Power cross spectra (Wiener-Khintchine theorem) 

S uy (iCO) = r{C U y( T)} 

Syu(io)) = tF {Cy U (x)} 

Power cross spectra from linear systems 
S yu (ico ) = G(ico)S uu (ico ) 

S uy {ico) = S uu (ico)G ( ico ) 

S yy (ico) = G(i<y)S uu (i©)G*(ia>) + S vv (ico) 


The power spectra and covariance functions are thus related according to the 
Wiener-Khintchine theorem (3.45—3.46). 


3.9 CORRELATION AND COHERENCE 

Assume that 

y(t) = x(t) + v(t ) = g{t) * u{t) + v{t) (3.47) 

where y(t) is an observation of a variable x(t) corrupted by a variable v{t) 
which is some external input that represents disturbances. A crude measure 
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to indicate the relative magnitudes of x and the noise v to the power of the 
observed output y is the signal-to-noise ratio (SNR) 


SNR = !« = - l (3.48) 

&VV e uv 

where the second equality holds if the signal x and the noise v are uncor¬ 
related, i.e., if the interaction energy e xv = 0. The correlation coefficient/? 
between two signals x and y is defined as the ratio 


n(r\ = Czy{ - T) 

y/\CTxi^)\\/\Cyy(j)\ 


(3.49) 


The quadratic coherence spectrum between the two signals x and y is defined 
as the ratio 


Yxyi^) = 


jS, y (^)j 2 

S Xx(.i&))Syy(iO)) 


(3.50) 


where the quadratic coherence always takes on a value in the interval 0 < 
y 2 (o) < 1 with a value close to one if the noise level is low (S vv « S uu ). 
The coherence spectrum is particularly interesting as a test of linearity in 
an input-output relationship. For instance, given a linear model (3.47) with 
observations y, input u, and with x, v not available to measurement, it holds 
that 


2/ _ |S u ,(uy)| 2 _ \Gjm)lSUio)) 

S uu {io))S yy (ico) S uu (m>)(|G(iftO| 2 S uu (m>) + S»»(ifl>)) 

_ 1 (3.51) 

Scv(ia>) 

+ S uu (i«u)|G(i(y)| 2 


Conversely, if in examining an input-output relationship a coherence value 
close to one is obtained, it may be inferred that the noise level is low and that 
there is a linear response of the type (3.47) between input and output. 

The coherence function thus expresses the degree of linear correlation in the 
frequency domain between the input u and the output y. The coherence 
function may be viewed as a type of correlation function in the frequency 
domain. There is no immediate counterpart in the time domain i.e., the 
inverse Laplace transform of y yu has no interpretation. 
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3.10 STATISTICAL CHARACTERIZATION OF DISTURBANCES 


Thus far we have discussed disturbances in quite general terms except for 
disturbances in the form of an initial condition or special cases of inputs in 
the form of impulses, steps, and sinusoids. Given the transfer function model, 
it is quite straightforward to predict the response to such inputs as they are 
completely determined by their initial behavior. In order to consider more 
complicated disturbances in the form of consecutive events, we thus introduce 
some notions from the theory of stochastic processes, which, as formulated 
by Kolmogorov and Wiener, has emerged from attempts to solve prediction 
problems; see Appendix D. 

A function x(f) = x(t,(o) whose values depend on a random variable co is 
called a random or stochastic process. For each fixed eo, x(t, a) is a function 
called a trajectory or realization or sample function. In discrete time we find 
a stochastic process in the form of a sequence over some interval of 

time. It is standard practice to use a statistical model to express the behavior 
of Xfr in terms of a probability distribution function 

F k {x) = <P{x k < x] , 0 < F k (x) <1, Vx e R (3.52) 


where !P{x* < x} denotes the probability that x* < x. Given the distribution 
function, the mathematical expectation can be determined as follows 


/ OO 

g{x k )dF(x k ) 

OO 


(3.53) 


for various functions g(x*). Several of the concepts introduced earlier, such 
as covariance functions and spectra, can be applied to stochastic processes by 
taking the mathematical expectation of the corresponding function. 

An important special case of stochastic processes is the white-noise process 
defined below. 


Definition 3.1—White noise 

A sequence of N uncorrelated, identically distributed stochastic variables 
{**} with = Mx> F,{{xi - p x )(xj - p x ) T } = Sij'L x for all i,j is known 

as white noise in the domain of time-series analysis. ■ 

It should be borne in mind that there is little possibility—both theoretical 
and practical—of determining such probabilistic perturbation characteristics 
in advance, and it may also be difficult to justify any assumptions as to the 
stochastic nature of a disturbance. Instead, it is a part of the identifica¬ 
tion procedure to determine such characteristics. In fact, the mean value 
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and the covariance function are much easier to obtain than the probability 
distribution of the process. In particular, this is helpful in determining the 
normally distributed white-noise process. Let {x*} be a white-noise sequence 
with normally distributed components x* e Because the sequence 

is normally distributed, it is described by its mean 

fix = i E{xk] (3.54) 


and its covariance matrix 


C xx (iJ) = £{(xi -V[xi) )(xj -£{*/} ) r } (3.55) 


Because the sequence is white with mutually independent x* values 


C xx (iJ ) = £ ((x, -£ {x;}) {Xj -£{ Xj) ) T } = 



i =j 

i r j 


(3.56) 


For normally distributed processes it is thus necessary to estimate fi x and £* 
as parameters along with other parameters. A standard approach to justify¬ 
ing assumptions on normal distributions is as follows: Experience suggests 
that noise encountered in physical systems is often characterizea by normal 
distribution. In addition, according to the central limit theorem of statistics 
(see Appendix B), when a large number of small, independent, random effects 
are superimposed, then, regardless of their individual distribution, the sum 
of these components is approximately normally distributed. For the same 
reason, the central limit theorem is used as an approximation theorem in 
identification theory. 


Using an assumed white-noise process {u*} as input to linear systems of the 
type 


*a + i = d>x* + Tu* 
y k = Cx k + v k 


(3.57) 


provides new stochastic processes whose correlation properties are determined 
by the matrices <J>, F, and C The class of random processes thus achieved 
is all stationary, and may well be amenable to modeling when the external 
perturbations and external actions v(t) of a disturbance model are sufficiently 
regular and stationary in their behavior. However, it is certainly a source of 
problems in application that most methods rely on assumptions of stationary 
and normally distributed behavior. For instance, a system that is nonstation¬ 
ary due to parameter variation is difficult to model by such methods, which 
clearly limits the scope, of their application. 
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As the elementary properties of stochastic processes will be known to the 
reader, this aspect will not be elaborated further, though some background 
material is available in Appendices B and D. 


3.11 EXERCISES 

3.1. Standard implementations of control systems have digital-to-analog con¬ 
verters with zero-order hold— i.e., u is constant between the sampling 
instants. Show that the transfer function G(s) corresponds to the follow¬ 
ing pulse transfer 


= 7 11 (3 - 58) 

in the case of zero-order hold. 

3.2, Show that the coherence spectrum fulfills 0 < y 2 (co) < 1. 

3.3 Develop an expression similar to Eq. 3.48, which is valid as a signal-to- 
noise ratio, also for the case where non-zero correlation obtains between 
the signal x and perturbation v. 

3.4 Show that the spectral density of a scalar white-noise process {x*} with 
Xk e 91(0, a 2 ) is constant over the whole frequency range. 

3.5 Determine the power spectrum of the variables Xk and y* of the process 
(3.57) with a white-noise input where Vk e 9(^(0, a 2 ). 

3.6 Assume that the covariance of the estimate 6 of the parameter vector 


e = ( 6y e 2 ) = ( 1 2 ] 

(3.59) 

= cov ( £} = c ( l ] 

(3.60) 


where c ana p are constants. What is the variance of 6\ + B<p. What choice 
of a minimizes the variance of linear combination cos a • 6\ + sin a • 6<p. 
Evaluate the 2-norm and the Frobenius norm of the covariance matrix 
£#; see Appendix A. Give an interpretation of the covariance ellipsoid , 
i.e., the level surface determined by the equation 

6 t 'Lq 1 0 = constant 


(3.61) 
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3.7 Consider the problem formulation of Exercise 3.6 and assume 6 e R n to 
be normally distributed 5\^(0,Z^). Show that 

e^e B x\n) (3.62) 

where X 2 ( n ) denotes the ^-distribution with n degrees of freedom; see 
Appendix B. ■ 



4 


Time domain 



Power spectrum - Correlation function 

Correlogram 


Spectrum Analysis 


In previous chapters some notions and theories of signals and systems were 
introduced. We may now turn our attention to applications of these concepts 
in the analysis of experimental data by means of signal processing. This 
chapter covers spectral estimation techniques and some standard modifica¬ 
tions of these techniques that are used to compensate for distortion due to 
discretization and finite measurement times. Covariance analysis and fre¬ 
quency response analysis are also treated. 

Two important spectral estimation techniques based on Fourier transform op¬ 
erations have evolved: First, spectral estimation based on the direct approach 
via the discrete Fourier transform, usually referred to as a periodogram , and 
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second, an important indirect approach, the correlogram , obtained by making 
a correlation estimate with a subsequent discrete Fourier transform based 
on the Wiener-Khintchine theorem. This methodology can be viewed as a 
generalization of frequency response analysis where multiple frequencies act 
simultaneously on the system input. 


4.1 THE DISCRETE FOURIER TRANSFORM 

Application of signal processing for spectrum analysis raises several important 
questions as to sampling strategy, quality of data, filtering, etc. The problems 
associated with finite sampling rates, finite measurement times, and aliasing 
naturally motivate the formulation of some sampling strategies for use under 
normal conditions. One method sometimes recommended for avoiding signal 
distortion is to sample the signal at a high rate with subsequent discrete-time 
filtering and data reduction, as analog filtering will not work with a linear 
phase shift which would be required for distortion-free filtering. 

The discrete Fourier transform 

As defined in Eq. (3.23), the discrete Fourier transform X N {ico) based on N 
measurements of a variable x(t) is the sequence {.X*} with the components 

X k = X N (ico k ) = hJ2 x{mh)e- ia » mh (4.1) 

m -0 

denned at the discrete frequency points 

(Ok = -rrrk for k = 0,1,..., N - 1 (4.2) 

Nn 

The discrete Fourier transform thus evaluates the z-transform on the unit 
circle 

X k = hX z (e iWkh ) (4.3) 

Notice that the z-transform is generally an infinite series defined on the set of 
all complex-valued numbers for which it attains a finite value (i.e., the region 
of convergence), whereas the discrete transform is only defined for a finite 
number of frequency points. 

The fast Fourier transform (FFT) algorithm is an important implementation 
of the discrete Fourier transform, although often with the restriction that the 
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Discrete Fourier transform 



Figure 4.1 The discrete Fourier transform applied to sequences consisting of a 1-Hz 
sinusoidal signal sampled at 10 Hz with 512,1024, and 2048 data points, respectively. 


number of measurements should be a power of two, i.e., N = 2 l for some 
number l = 2,3,4,.... 

Example 4.1—FFT of a sinusoidal signal 

A sinusoid of the frequency 1 Hz was sampled at the rate of 10 Hz. To each 
of three different sets of data for 512, 1024, and 2048 samples, respectively, 
the fast Fourier transform was applied (Fig. 4.1). Spectral leakage due to the 
finite measurement intervals is noticeable all over the spectrum. Moreover, 
the longer the duration of measurement the lower is the frequency at which 
the lower end frequency represented in the spectrum appears. ■ 

Remark: The definition (4.1) of the discrete Fourier transform does not im¬ 
mediately provide the spectrum in the desired frequency range determined by 
the Nyquist frequency co n , i.e., in the interval [~(o n , co n ]. We accomplish this 
by shifting the periodic frequency scale on the unit circle so that : = (0;,--k 
for k = 1,2,..., N/2 - 1; see Fig. 4.2. 
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Figure 4.2 Circular permutation of frequency components z* = exp(i<a*/i) for N - 
32 as applied in the discrete Fourier transform. 


Time domain 



Figure 4.3 Power spectrum estimation via different methods. 


Notice also that the frequency resolution of the spectrum is determined by the 
frequency points defined in Eq. (4.2). A longer sequence of data thus results 
in a finer resolution. ■ 


4.2 POWER SPECTRUM ESTIMATION 

Two important spectral estimation techniques based on Fourier transform 
operations have evolved. The direct approach via the discrete Fourier trans¬ 
form is usually referred to as a periodogram , whereas an important indirect 
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approach, the correlogram, is obtained from a correlation estimate with a sub¬ 
sequent discrete Fourier transform based on the Wiener-Khintchine theorem; 
see Fig. 4.3. 

The periodogram 

The periodogram or sample spectrum is defined as 

S XI (ico k ) = ±.\X N (m k )\ 2 , for *> k = (4.4) 

and is valuable for graphical inspection of the contents of a signal obtained 
from a system. In cases where an observed variable x(t) consists of sinu¬ 
soidal waves with superimposed white noise, the periodogram is effective in 
indicating the discrete frequencies of the sinusoids. 

In effect, the periodogram S xx (ia)) is a sampled version of a spectrum S xx (ico) 
convoluted with the transform of the rectangular window jF{nr(0} that con¬ 
tains the data samples. 

The correlogram 

An important indirect approach, the correlogram, is obtained by means of a 
covariance estimate with a subsequent discrete Fourier transform based on 
the Wiener-Khintchine theorem. The estimated covariance functions are 

t = k 

C zy (kh) = 

l=k 

where the normalization factor N — k in Eq. (4.5) is statistically motivated 
and results in unbiased estimates C xx , C xy of the covariances C zx and C xy , 
respectively. The values of the covariance functions are often arranged in the 
matrix 

' C xx ( 0) C xx (-1) ••• C xx (-n + 1) 

c xx ( i) ••. i 

C xx (-1) 

.• C**(/i - 1) ••• C xx (l) C xx ( 0) 




Rxx(n) = 


(4.6) 
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which has a Toeplitz matrix structure, i.e., the matrix is constant along its 
diagonals, a property that can be used for organizing efficient numerical algo¬ 
rithms; see Chapter 11. The subsequent power spectrum calculation is made 
according to the expressions 


N-1 

S xx (ico k ) = C xx (mh)e~ io>tmh 

m=0 for = ^-k and k = 0,1. N -1 (4.7) 

N-x Nh 

S xy {i(o k ) = £ C xy (mh)e-^ mh 

m = 0 

defined for discrete frequency points (Ok = {2K/Nh)-k where k = 0,1,...,AT—L 
The correlograms and periodograms for a pure sinusoid of 1 Hz and sampled 
at 10 Hz are compared in Fig. 4.4. 


4.3 SPECTRAL LEAKAGE AND WINDOWING 


All methods are based on a Fourier transform theory with infinite measure¬ 
ment sequences. For obvious reasons, the measurement is made on a finite¬ 
time interval only. A problem is that, as a measurement may not take infinite 
time, only N data points are assumed to be available, which causes problems 
of spectral leakage associated with the truncated measurements, which in 
turn entails a systematic distortion of calculated spectra; see Section 3.5. 

The discrete Fourier transform based on N measurements as already formu¬ 
lated can be viewed as the truncated z-transform 


N -1 

Y N (ico) = h^2 y(mh)e~ ia)hm 

m = 0 


oc 

-OC 


(4.8) 


where 


( 0, k < 0 

rw(£) = ^ l, 0 < k < N 
to, k > N 


(4.9) 


The Fourier transform of a product can be expressed as the convolution of 
Fourier transform of the two factors, respectively. Hence from Eq. (3.26) we 
have 


Y N (ia)) - 7{y(t ) ■ LU* ( t ) • n T (t)) = Y(ico) * ^{114 ( t) l> (0) (4.10) 
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Frequency [Hz] 

Figure 4.4 Periodograms {upper) and correlograms {lower) for a pure sinusoid of 
1 Hz sampled at 10 Hz. 


where the sampling-dependent final factor has the Fourier transform 

h (t) rir (0) = %{^N(k)}\z=exp{ia)kh) 

1 -*-". < 411 > 
- \ — z - sin(co k h/2) 


The distortion arising due to convolution with the second factor is a system¬ 
atic problem. The characteristic “sidelobes” of Eq. (4.11), i.e., the Fourier 
transform of a rectangular time-domain window (Fig. 3.4), cause spectral 
leakage and a bias in the amplitude and the position of a harmonic estimate. 
In addition, spectral leakage has deleterious impact both on power spectrum 
estimation and on the detectability of sinusoidal components. 

To reduce the effects of this bias, the window should exhibit low-amplitude 
sidelobes far from the main central lobe, and the transition to the low sidelobes 
should be very rapid. In fact, a given spectral component, say at co = (Oq, will 














60 


Chap. 4 Spectrum analysis 


Table 4.1 Application of spectrum estimation to experimental data 


Discrete Fourier transform (N measurements) 

N ~i 2n 

= h ^ x{kh)e~ i0>kmh , where (o k = j^k 

m -0 

Periodogram, sample spectrum 

S xx {i(o) = -^-|X w (icy)| 2 

Correlogram 

N/ 2-1 

Sy U (ico k ) = h ^2, C yu {mh)e- lC0kmh 

m= -N/2 

Correlogram with time window 

N/ 2-1 

Syu(io)k) = h ^2 Cy U (mh)w(mh)e~ ia>kmh 
m = -N/2 


be observed at another frequency, say at co - co a , according to the gain of a 
window obtained from (4.11), centered at coq and measured at co a . 

Example 4.2—Spectral leakage 
Assume that 

u(t) = uisin(t); N = 1024 (4.12) 

The spectral leakage in a spectrum obtained without windowing is clearly 
visible; see Fig. 4.5. In Fig. 4.5 the lower graphs show the reduced spectral 
leakage obtained by applying a Bartlett triangular window. Also notice the 
gross distortion of time-domain data that results from using the window func¬ 
tions. ■ 
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Time [s] Frequency [Hz] 



Time [s] Frequency [Hz] 

Figure 4.5 Discrete Fourier transform of a sinusoid (1 Hz) sampled with 100 Hz 
with N = 1024 samples. The spectral leakage in a spectrum obtained without win¬ 
dowing is clearly visible. The lower graphs show the reduced spectral leakage ob¬ 
tained by applying a Bartlett triangular window. 


Window carpentry 

A remedy to the spectral leakage problem is to introduce window functions , 
which are some weight functions applied to data to reduce the spectral leak¬ 
age associated with the finite observation interval. Let the discrete Fourier 
transform include a window function { wi\ in the form of a sequence that 
multiplies the data points. 


N -1 

Y k =h'£xm“’ m e- ia ' mh (4-13) 

m = 0 


A major problem with using Fourier transforms based on finite measurements 
is that it forces a periodic extension to both the discretized data and the dis¬ 
cretized transform values, whereas windowing presupposes that data outside 
the measurement interval are zero. The weighting function may be chosen as 
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Bartlett 





Figure 4.6 Some window functions commonly used to reduce spectral leakage. The 
time axis is expressed in normalized time, i.e., t/Tm takes on values over an interval 
[- 1 . 1 ]. 


any function satisfying 


w{z) 


fl, r = 0 
l 0, r large 


(4.14) 


Moreover, to minimize spectral leakage the window-function derivative should 
be small for large values of r. The “time window” functions are often expressed 
in the frequency domain as W(co). The rectangular window is the window 
that is always present in serial data recording, and that is defined by a finite 
number N of measurements; see Table 3.1. 


f l, M < t m 

w n {t) = < (4.15) 

(0, \t\>T m 


Other commonly used windows are those of Bartlett, Hamming, Hanning, 
Kaiser-Bessel, Laplace-Gauss, and others (see Table 4.2 and Fig. 4.6), 
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Table 4.2 Some time windows commonly used in spectral analysis. Notice that all 
time windows are zero for time lag r outside an interval [— Tm,Tm]- 


Time window w(t) 


Rectangular 


w Tu {r) = 


i, M < Tm 
0 , | t | > Tm 


Bartlett 


(1 - ifX) ■ 


Hanning 


+ cos {n-^r-i^-WT^r) 


Hamming 

^0.54 + 0.46cos (xjr-)) * w t m ( t ) 


Laplace-Gauss 


r 2 

—n—('TT /) 2 ( \ 

e 2 w Tm {t) 


Kaiser-Bessel 

Io{P (\/l - (t/Tm) 2 )) 
Ioifi) 


WT m (?) 


Spectral window W(co) 


2Tm ^£t = Wn(co) 


Tm 


sin(<y7W2) 2 
{coTm/ 2) 2 


2^n(«) + 

+ lw n (co + ^-) + 


0.54W n (^) + 


n 


f 0.23 W n (co + —) + 
1 m 


n 


+ 0.23W n (^y — —) 

j- M 


r- CO t 2 

^T M e “ * Wn(co) 


(Numerical transform) 
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Example 4.3—Spectral leakage in frequency response analysis 
Frequency response analysis ( cf. Section 2.3) is also affected by spectral dis- 
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Figure 4.7 Transfer function of frequency response analysis with a fixed integra¬ 
tion time T. The peak around the normalized test frequency co — \ becomes more 
prominent for longer measurement times T. The frequency axis represents normal¬ 
ized frequencies. 

tortion due to finite measurement time. In fact, the same analysis of spectral 
leakage and windowing applies to this method. Frequency response analysis 
may be viewed as a filter with certain rejection properties of periodic disturb” 
ances; see Fig. 4.7. In particular, the sensitivity to a periodic disturbance of 
the same frequency as the input test signal u = sin (o^t is always important. 
The damping for frequencies different from the test frequency becomes better 
for longer measurement duration. 

Assume that y(t ) is periodic for all t and let the integration time T be assumed 
to be fixed at T = kh = k ■ (2 n/( 0 Q ). It is then possible to derive the following 
transfer functions from the system output y and the sine and cosine channels, 
respectively. Consider, for instance, the sine channel 

fT 

s T {co o) = / y{t) s,in{(Oot)dt (4.16) 

Jo 

and the convolution 

y s (t) = y(t) * g s {t) = [ y(r)sin(co 0 {t-r))dT (4-17) 

Jo 

Notice that sina>o^ = sm^otT 1 - t) for T such that coqT = n,3n,5n,.... In 
fact, the functions St(coo) and y s (T) take on the same value for T such that 
coqT = ;r, 3;r, 5#,_ The integral y s (t) of Eq. (4.17) can be viewed as a 
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Figure 4.8 A test signal for frequency response analysis modified with a time win¬ 
dow. 


convolution between y and the function 

g s (t) = sin(co 0 t) ■ !>(*) (4.18) 

The sine channel frequency properties may thus be understood in terms of a 
transfer function between input y and the frequency response 

T{ys{t)) = G s (ico)-Y(ico) (4.19) 


where 

G s (ico) = T{g s {t)) = J{sin(©oO • n r (f)} ( 4 -20) 

This transfer function is obviously non-zero for 0) 0)q. Disturbances of any 

spectral composition entering into the frequency response analysis may there¬ 
fore affect the result of frequency response analysis. However, the transfer 
function tends to depress frequencies different from the test frequency (Oq for 
long experiment durations. 


Y s {ia) 

Y(ico) 


= G s (ico) = ?{g s {t)) 


(4.21) 
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Suppression of spectral leakage for frequencies co / coo of the transfer function 
in (4.21) may be enhanced by introducing a time window similar to that used 
in spectral analysis. For the sine channel we obtain 

st(o)o) = [ y(t) w (t) sin(o)ot)dt, o)qT = n,Zn,5n,... (4.22) 

Jo 

An example of a window function applied to a test sequence is given in Fig. 
4.8, where it is obvious that the modification of the test signal is considerable. 


4.4 TRANSFER FUNCTION ESTIMATION 


Estimation of transfer functions by methods of spectrum analysis is effective 
and can be viewed as a generalization of frequency response analysis in which 
the test signal is composed of multiple sinusoids at various frequencies with 
many frequencies acting simultaneously on the system input. As with spec¬ 
trum estimation, there are two fundamental methods based on the Fourier 
transform that may be used to make spectral estimation of transfer functions. 
One method is based on the discrete Fourier transforms of input and output 
data. The transfer function estimate is then obtained according to the defini¬ 
tion of transfer functions as a ratio between the discrete Fourier transforms 


of inputs and outputs. 


H 1 (e icoh ) 


YNjico) 

U N {ico) 


(4.23) 


The estimate H x (e icoh ) is, of course, only defined for the discrete frequency 
points co = a>k as obtained from the discrete Fourier transform. 


A second method relies on the ratio between the input-output cross spectrum 
and the input autospectrum as obtained via the discrete Fourier transform of 
the cross-covariance and autocovariance functions. 


H 2 (e i0}h ) = 


S yu ) 
& uu ) 


(4.24) 


The noise characteristics of (4.23) and (4.24) are different, of course, as they 
depend on the noise spectrum and the input-disturbance cross-covariance, 
respectively. The expected contribution from the disturbance v in S ytL (i(o) is 
small in cases where the input and the disturbance^are uncorrelated, whereas 
the disturbance contribution to Y^(ico) and thus Hi(e lwh ) might be consider¬ 
able. Thus, the method (4.24) can be expected to yield better results than 
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Figure 4.9 Transfer function estimates H\(ico) = Y^ {ico) / U and Hiij.0)) - 
S yu (ia>)/S uu (ico) obtained from a system G(s) = l/(s + 1). The input (solid line) is 
a swept-frequency sinusoid and is shown together with the output (dashed line) in 
the upper diagram. 

(4.23) in cases where the input and the disturbance are uncorrelated. Gen¬ 
eralizations of such averaging methods can be done by means of so-called 
higher-order cumulants or higher-order spectra. 


Example 4.4—Transfer function estimates 

A modified form of frequency response analysis is the sinusoidal input with a 
swept frequency ( cf. Fig. 4.9). This input is known as a chirp signal and is 
effective in producing a rapid form of frequency response analysis. Obviously, 
both methods (4.23) and (4.24) work appropriately in frequency ranges with 
a non-zero input spectrum. On the other hand, problems can be expected to 
arise in frequency ranges where Uff(ico) and S uu (ia)) are of low amplitudes 
or where disturbances contribute to the spectral contents. 

Evaluation of the coherence spectrum is valuable for the detection of disturb¬ 
ance levels prior to estimation of transfer functions. ■ 
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Parametric models 

Simple parametric models such as a rational transfer function may be obtained 
by manually fitting asymptotes to the Bode diagram. The procedure involves 
estimation of the low-frequency gain by fitting asymptotes to the gain plot. 
Each numerator factor (s + g x ) contributes significant phase advance for js| = 
co > qi, whereas each denominator factor (s + pi) contributes significant phase 
lag from |s| = co > pi. 

Some important special features can be extracted from the Bode diagram such 
as static gain, resonances obtained from the gain diagram, or dead time as 
obtained from the phase plot. 

Some statistical properties of transfer function estimates H(e itoh ) 

The quality of a transfer function estimate is dependent on many factors such 
as the statistical properties of covariance and spectrum estimates and the 
discrete Fourier transform, as well as the experiment that has generated the 
data used in identification. A technical problem is that the transfer function 
estimate is not one single number. Instead, the transfer function estimate 
is defined for a sequence of N frequency points a>k for which the accuracy 
must be evaluated. Prior to estimation of transfer functions, it is valuable 
to evaluate the coherence spectrum for assessment of linear dependence and 
for detection of disturbance levels. It is therefore advisable to provide the 
coherence spectrum in support of an estimated transfer function. 

Second, it is possible to derive statistical properties and criteria for evaluation 
of the quality of transfer function estimates based on spectrum analysis. As¬ 
suming the input to be periodic, and the measurement time JV to be a multiple 
of the period, it can be shown that the following statistical properties hold: 

o H(e lcoh ) is only defined for a fixed number of points (which follows from 
Eq. [3.26]). 

o H(e lcoh ) is unbiased at these frequencies and its variance decreases as N 
grows. 

If the input may be considered to be a stochastic process, the following impor¬ 
tant properties should be borne in mind: 

° The estimate H(e l0>h ) is asymptotically unbiased as the number of obser¬ 
vations N increases. 

° The variance of the transfer function estimate at a given frequency point 
does not decrease as N grows. Instead, the signal-to-noise ratio deter¬ 
mines the accuracy at each frequency. 
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o The spectrum estimates at different frequencies are asymptotically un¬ 
correlated. 

Notice that the variance of H(e l0)h ) does not decrease as the number of data 
points N grows, because there are as many independent estimates as there 
are data points (which means that there is no effective averaging as a result 
of chosing a longer data record). A remedy to this problem is to subdivide the 
measurement series into blocks and to average the results obtained from the 
spectral analysis of each block. The number of frequency points is then fixed, 
and an improved estimate may be obtained provided that H(e l0)h ) has small 
variation within each block. 

Inverse Laplace transformation 

Impulse and step responses may be calculated from transfer function esti¬ 
mates via inverse Laplace transformation. However, the transfer function 
estimates always have a periodic extension to higher frequency ranges. Trun¬ 
cations of spectra and other operations on the spectral estimates may thus 
result in gross distortion of estimates of the impulse response obtained from 
inverse Fourier transforms. It is therefore useful to consider the application 
of spectral window functions prior to the calculation of the inverse Laplace 
transformation. 


SMOOTHING OF SPECTRA 

Statistically inconsistent results in spectrum analysis can occur, for instance, 
during estimation of a stochastic process generated by passing white noise 
through a linear system or filter, in which case the variance of the spectral 
estimates does not decrease. Therefore, there is a need for some sort of en¬ 
semble averaging or smoothing of the sample spectrum, and we distinguish 
among three different methods available for the purpose. 

o Windowing offers a trade-off between spectral resolution and smoothness, 
the trade-off being dependent on the choice of window. 

o Block segmentation of data includes splitting the data into segments, 
computing the periodogram for each segment, and averaging the peri- 
odograms for all the segments. 

o Zero padding consists of making a DFT of a sequence of N data points ex¬ 
tended with a k * N sequence of zeros. Transforming a data set with zeros 
only serves to interpolate additional spectrum values within the frequency 
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Frequency [Hz] 



Frequency [Hz] 



Frequency [Hz] Frequency [Hz] 

Figure 4.10 Smoothing of a spectrum of a signal consisting of a sinusoid y(t) = 
sin(£) confounded with noise with a signal-to-noise ratio S/R = 1. Smoothing can be 
obtained with block segmentation and windowing whereas higher spectral resolution 
can be obtained by windowing and zero padding. 


interval [~a) ni co n ]. The additional values of the spectrum, computed 
by an FFT applied to the zero-padded data set, fill in the shape of the 
continuous-frequency periodogram. Zero-padding is useful for smoothing 
the appearance of the spectrum estimate and to resolve potential ambigu¬ 
ities, but it does not improve the fundamental frequency resolution—i.e., 
the reciprocal of the measurement interval. 

The three methods are demonstrated in Fig. 4.10 for a case with a sinusoid 
of frequency 1 Hz confounded with noise with a signal-to-noise ratio equal to 
one. 

Another important aspect in this context is that the computation of S yu (ico) 
from C yu (r) has poor statistical and numerical properties. The deterioration 
in the quality of the estimate of C yu (r) that occurs as t grows can be remedied 
by giving more weighting to the C yu (kh) for small t values in the calculations 
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Syu(ico) — — 


N/ 2-1 

J2 c yu (kh)w(kh)e~ ia,hk 


k=-N/2 


(4.25) 


4.6 COVARIANCE ESTIMATES AND ‘CORRELATION ANALYSIS’ 

Correlation analysis is a method based on statistical analysis for estimation 
of the weighting function h(t). The input test signal is a white noise sequence, 
and the method applies to linear systems where the white-noise sequence is 
usually generated as a pseudorandom binary sequence (PRBS). 

Consider the output y obtained as a convolution between the input u and the 
weight function h(k) as 


y(k) = '*Th(e)u((k-e)h) + e(kh) (4.26) 

*=0 

The covariances may be obtained from the equation 

oc 

C yu (kh) = Y,h(k)C uu ((k-£)h) (4.27) 

k = 0 

The estimated counterparts based on N measurements are 


Cy U (kh) - ' 2 ^-—£ £i=* + i yt u i-k 

Cuu(kh) = ^ — k u i u i-k 

An estimate of h(k) may thus be obtained from the equation 


(4.28) 


c yu (kh) = J2h'e)C uu ttk-e)h) 


(4.29) 


*=o 


This infinite-dimensional equation is, in general, difficult to solve, but the 
special case of a white-noise input with 


C uu (kh) = 


o\, if k = 0 
0, if MO 


(4.30) 
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gives a simple estimate of the weighting function as 

h(k) = \c yu {kh) (4.31) 

0- u 

Thus a close connection exists between the cross-covariance function and the 
weight function h(k). Unfortunately, the same poor numerical properties of 
the weight function also appear in the correlation analysis. The result C yu (kh) 
is in general a poor estimate of C yu {kh) for large k values, a problem analogous 
to that occurring in the impulse response test. 


4.7 HISTORICAL REMARKS 

The Fourier transform dates back about 200 years. Schuster (1898) developed 
the periodogram method for detecting hidden periodicities in sun-spot activ¬ 
ity data. Yule (1927) and Walker (1931) pioneered work on autoregressive 
models. Wiener and Khintchine studied stochastic processes with methods 
of Fourier transform and established relations between autocorrelation func¬ 
tions and spectra. Blackman and Tukey provided practical implementation of 
Wiener’s autocorrelation approach to power spectrum estimation in the 1950s. 
Cooley and Tukey presented the FFT (fast Fourier transform) in 1965. 
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4.9 EXERCISES 

4.1 Show the convolution property of the z-transform; i.e., given thatXi(z) = 
Z[xi(k)) and-X^z) = Z{x 2 {k)) , show that 

X(z) = Z{xi(k) * x 2 (k)} = Xt(z)X 2 (z) (4.32) 

4.2 A swept-frequency sinusoid of the type used as input in Example 4.4 can 
be represented by the sequence of complex exponentials 

{ Uk] kjJ = ( e“ Uok2 ^ 2 } £r 0 \ for some constant coq (4.33) 

Show that the transfer function H ( z ) of the input-output relationship 

N -1 

yk = J2 h J u k-j ( 4 - 34 ) 

j = 0 

can be determined in a straightforward way by evaluating the z-transform 

N -1 

Y(z n )=Y2y>‘ z n k ’ where Zn=e im ° n (4.35) 

k=o 

at a sequence of points [z n ] . 

Remark. This algorithm, known as the chirp-z transform and developed 
by Rabiner et al (1969), thus evaluates the z-transform at another set of 
points than those used in the ordinary discrete Fourier transform. ■ 
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5.1 INTRODUCTION 

An important task in system identification and statistics is to find the rela¬ 
tionships, if any, that exist in a set of variables when at least one is random or 
unknown, being subject to random (unknown) fluctuations and possible mea¬ 
surement errors. In regression, typically one of the variables, often called the 
response or dependent variable, is of particular interest and is denoted by y. 
The other variables <p L ,4> P , usually called explanatory, or independent 
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variables, or regressors , are primarily used to predict or explain the behavior 
of y. 

The appropriate mathematical relationship is sometimes known if the process 
is governed by Accepted scientific laws or by a known physical process. It can 
happen that plots of data suggest some relationship to exist between y and 
the (pi, a relationship we might attempt to express via some function 

, y = f{<pi,... , </> p ; 6) + v (5.1) 

which is known except for some constants or coefficients 8 called parameters 
and a possible disturbance v . For instance, the function f may be known 
except for the parameter vector 8 = {6 \,..., 8 P ) T , and a major aim of the in¬ 
vestigation would then be to estimate the parameter vector 8 of the process as 
precisely as possible. Finding a model is thus reduced to a question of estimat¬ 
ing 8 from experimental data. It is obviously important to use a parametric 
model flexible enough to represent or approximate a sufficiently wide range 
of behavior. An important special case is linear regression analysis based on 
the model 

y(t) = <p T (t)8 + e{t) (5.2) 

where additive errors e are used and where <p(t) is an n-dimensional vector. 
The regression vector <p = {(pi, ..., (p p ) T can, of course, include nonlinear func¬ 
tions of data such as squares, cross products, and transformations such as 
logarithmic, trigonometric, or exponential functions. The important require¬ 
ment is that the expression (5.2) is linear in parameters . 

Example 5.1—Linear regression models 
The model 

Oo' 

y - 8 q + 6\u + & 2 u 2 = |l i/ i/ 2 j 8\ (5.3) 

is a linear regression model with respect to the parameters Qi whereas the 
model 

y = 8 q + 8ie e * u (5.4) 

is not a linear model. Linear regression models are often illustrated by means 
of straight lines in linear spaces; see Fig. 5.1. ■ 

It is often possible to convert nonlinear regression models to a form suitable 
for linear regression analysis as shown in the following example. 
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Figure 5,1 Least-squares estimation applied to fit a straight line to data points 
indicated by **\ Notice that the least-squares fit minimizes the vertical deviation 
between the regression line and observed data. 

Example 5*2—Nonlinear models 

Consider a sample of gas kept at constant temperature with a volume V and 
a pressure p. Thermodynamical laws predict that 

p - V r = c (5.5) 

where y and c are constable. Experimental verification of this law and deter¬ 
mination of the parameters y and c with linear regression methods directly 
applied to (5.5) nre obviously not possible. The nonlinear model may, however, 
be transfonfied to a linear regression model by logarithmic transformation 

logp = —ylogV + logc = ( -logV 1 } [ lo g c ] ( 5 - 6 ) 

The model (5.6) is now linear in the transformed parameters y and logc. 
Notice, however, that the statistical properties of the transformed model might 
be quite different from those of the original model. ■ 

As defined in Eq. (5.2) linear regression analysis is based on the model 

y(t) = <p T (t)d + e(t ) (5.7) 

with additive errors e. The observations yk are assumed to be collected, with 
a constant sampling period h, together with the corresponding regression vec¬ 
tors {0*} 

y k = y(kh ) 

<Pk = <p{kh) = <pk2 ■■■ <Pkp ] 


T 


(5.8) 
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where the additive errors e are assumed to have the form 


£{e t } = 0, £{efey} = crfS u , ViJ ( 5 . 9 ) 

Assuming that there are p parameters 8 U .... 8 P to fit the model with 8 e Rp 
^ observations, the problem is to find an estimate 0 of the parameter vector 
8 from the observed variables {y*}^ =1 and 

yi = <PiO + ei 
y 2 = <PlO + e 2 

; (5-10) 

yN = 4 >n0 + e N 


In matrix notation we obtain the vector of observations 

m 



v yN) 

and the regressor matrix <b N and the error vector e where 


/ 

[<p\' 


r ' 

<P N = 

02 

, and e = 

e2 


• 0N > 




Finally, the resulting estimation model for linear regression is 


(5.11) 


(5.12) 


M : y N = <t> N 0 + e (5.13) 

The mismatch vector e (also called prediction errors) between observations 
and the linear regression model for a certain parameter estimate 8 is 


f £ i 


£( 8 ) = 


£2 




(5.14) 


v £n ) 
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The parameter vector Q e R p often needs to be determined from a large set of 
observations. However, there is in general no satisfactory algebraic solution 
to the equation e = 0, as this is an overdetermined linear equation system for 
N > p. 

Several methods exist such as the deterministic criteria of least-squares esti¬ 
mation and Li— estimation that involve minimization of criteria in the form 
of error functions (loss functions) 

miny^ |yj — <f>f 0| 2 , or min [yj - <pf 9\, (5.15) 

e 8 

respectively. The aim is thus to find the model, of specified structure, which 
fits the observations best according to a deterministic measure of error be¬ 
tween the model output and the observed output as considered over all obser¬ 
vations. 

Also, stochastic formulations such as maximum-likelihood and Bayesian esti¬ 
mation apply to obtain the “best” estimate 6. These methods are often some¬ 
what more complicated as they entail a choice of the stochastic distribution of 
the observation errors involved; see Chapter 6. 


5.2 LEAST-SQUARES ESTIMATION 

Let 6 denote an arbitrary estimate of the parameter vector 6. The least- 
squares criterion aims to minimize the sum of the squared errors between the 
model output and the observations. 

11^1 

V{6) = ±e T e = £ y>2 = iON - ^NefOx - <S> N 0) (5.16) 

1 1 k = \ 

with the minimum 

min V{6) = V{6) (5.17) 

e 

obtained for the optimal estimate 

e = (5.18) 

This can be seen by taking the gradient of the optimization criterion (5.16) 


(5.19) 
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where the minimum dV/86 = 0 provides the normal equations 

-T'n + 0 T (®N<t>N) = 0 (5.20) 

The gradient takes on the value zero for 6 = 6 = (d>£<&J.e., when 
6 is chosen as the ieast-squares estimate. 

Notice that (5.19) and (5.20) are necessary conditions for obtaining a mini¬ 
mum. If the positive semidefinite matrix is assumed to be invertible, 

then we can also show sufficiency by completing the squares of (5.16). 

V(6) = ±(y N - <t> N 0) T w N _ q> N e) = hyT {I _ 4> N (<S>T<t> N )-'<t>l)<y N + 

(5.21) 

As the first term of Eq. (5.21) does not depend on 6, and as OjyOjv is positive 
semidefinite and invertible by assumption, it maybe concluded that V(6) has 
a unique minimum at the least-squares solution 

6 = (<t> T N <P N )- l <t> T N J N ( 5 . 22 ) 

The first term of Eq. (5.21), then, represents the minimum value, i.e., the 
residual sum. 

Least-squares optimality has several attractive features for purposes of iden¬ 
tification. First, large errors are heavily penalized. Second, the least-squares 
estimates can be obtained by straightforward matrix algebra. Third, the least- 
squares criterion is related to statistical variance, and the properties of the so¬ 
lution can be analyzed according to statistical criteria. Assuming that 
is invertible, that the noise components are uncorrelated with the regressors, 
and that £{e,} = 0 and £(e;e/} = a^Sy for all i,j, it follows that the least- 
squares estimate has the following statistical properties: 

i. 6 is an unbiased estimate of 6. 

ii. The covariance matrix of 6 is 



(5.23) 

iii. An unbiased estimate of of is 


- AT 

(5.24) 


Proofs of the above statements i-iii are straightforward according to the fol¬ 
lowing calculations all of which rely on the assumption of regressors uncorre¬ 
lated with disturbances, i.e., £{d>^e} = 0. 
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i. The least-squares estimate is unbiased when ei are uncorrelated with the 
regressors so that 

e = (<pJ f 4> N )- 1 <t>J f (<t> N e + e) = e + =* £{£} = o 

(5.25) 

ii. When the e,- are uncorrelated with the regressors, it follows that 

£{(0 - 8){9 - 0 ) T ) = = 

= (<I>D<l>jv) _1< I>]v‘E{ee T }<I>iv(<I>^<I>jv)'' 1 = (5.26) 


iii. The expected minimum of the least-squares loss function can be calculated 
using the relationships (5.13) and (5.20) 

£{V(tf)} = \‘E{ < XN0 r N -^n^n) = = 

= i£{tr((/ATx// - <E>^(<X>|)Ow) _i< l > ]v)ce 7 ')} = 


2 (tr/iVxA/ tr/ pxp )cr ( 


-(N-p)a\ 


so that 


<E{d\} = £{ 


-vm = a, 


N-p 

which proves the statement. 

Example 5.3—Least-squares estimation of coefficients 
Consider the following data set of paired data 

T 


u 


12 3 4 


) ■ 


and 




6 17 34 57 


and assume that the following model should be fitted to data 

( 0o 1 

<M : y = 6 0 + 6 x u + 62 U 2 = [ 1 u u * ) I 


t &2 ) 


Let the regressor matrix be 


<I> 


1 

Ui 

K?1 


1 

1 

1 

1 

U 2 

u\ 


1 

2 

4 

1 

uz 

u\ 


1 

3 

9 

, 1 


u\ > 


. 1 

4 

16 , 


(5.27) 

(5.28) 


(5.29) 


(5.30) 


(5.31) 
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Figure 5.2 Input u and output y obtained from observation of a system y* = 
ayi,- i + &«*-1 + e*. 

The least-squares solution 6 = (0 7 ’<!>)~ 1 <I> T 9'' is then 

'El-.I ELiE»..“l]" 1 r ELiw ) m 
«= ELi«* ELi '4 El., ul e:.,u*w - 2 (5.32) 

IeL^I ELi“ l ELi“ll IeL^IwJ 

which fits the polynomial to data without residual error. ■ 

Example 5.4—Least-squares identification of a first-order system 
Assume 1000 samples of inputs Uk and outputs y* to have been collected from 
a system described by the equation 

s : yk = ay*-i + bu k _ x + e k (5.33) 

where the coefficients a = 0.9 and 6 = 0.1 are not known and need to be 
estimated. Artificial data have been generated with a = 0.9 and b = 0.1, and 
u and e as random variables with variances of - of = 1. The mean of the 
disturbance e is zero; see Fig. 5.2. It is natural to adopt the following linear 
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Loss function V 




Figure 5.3 Level surfaces of the loss function V as obtained in linear regression 
from observations of a system y* = ay*_i + buk-i + e*. 


regression model 




y k = ay k -i + bu k -1 




The least-squares estimate is 


.'a ^ ( 0.8992 \ 

[ b J " { 0.0899 J 


(5.34) 


(5.35) 


The loss function and the variance estimate are 


V(e) = 499.94, and d 2 e = 1.0019 


The estimated covariance matrix of 6 is 


of (d> T 0) 1 


( 0.249 -0.090 1 10 _ 3 

1-0.090 1.023 J 


(5.36) 


(5.37) 
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Input u 



Time [s] 



Figure 5,4 Input u and output y obtained from observation of a system y* = 
ayk-i + buk -1 + e*. 


which provides good estimates considering the large disturbance level. The 
level surfaces of the loss function V as a function of the parameter estimates 
a and b are shown in Fig. 5.3. 


Example 5.5—LS identification with non-zero disturbance mean 
Assume 1000 samples of inputs a* and outputs )'k have been collected from a 
system described by the equation 

^: yn = ay k _ l + bu k _ r + e k , £{e*} =1 (5.38) 

where the coefficients a = 0.9 and 6 = 0.1 are not known and need to be 
estimated (Fig. 5.4). Artificial data have been generated with a = 0.9 and 
b = 0.1 with u and e as random variables with variances trf = cr„ = 1. The 
mean of the disturbance e is one, whereas it was zero in Example 5.4. By 
adopting the linear regression model 

[l)=<Ple (5.39) 




y* = ay k - 1 + bu k _ x = ^ u k -i ] 
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and by fitting the least-squares estimate we have 



(5.40) 


which provides poor estimates as compared to Example 5.4. The main differ¬ 
ence between the conditions of Example 5.4 and this example is the non-zero 
mean of the disturbance in this example. Clearly, the disturbance properties 
in this example do not fulfill the condition of Eq. (5.9) and such disturbances 
are for this reason characterized as colored noise. 

The loss function and the variance estimate are 


V(0) = 531.04, and = 1.0642 

The estimated covariance matrix of 6 is 

0.0295 -0.2734 


s ?(® T <»)- 1 - (_° 0 


2734 3.568 


10 " 


(5.41) 


(5.42) 


which is close to being singular. Hence, the least-squares solution is sensitive 
to colored noise. * 


Example 5.6—Least-squares identification of a step response 
Assume that 1000 samples of inputs u* and outputs y* obtained from a system 
described by the equation 


S'. yk = ayk-i + buk-i + £k 


(5.43) 


where the coefficients a = 0.9 and b = 0.1 are not known and need to be 
estimated (Fig. 5.5). (The data have been generated with a = 0.9 and b = 0.1 
with u as unit step input under noise-free conditions, i.e., e* = 0 for all k.) 
The least-squares solution of the parameters a and b is obtained as 


'-(!)-( 

0.9000 \ 

0.1000 J 

(5.44) 

which provides very good estimates. The 

loss function and the variance esti- 

mate are 

of = 3.5760 • 10~ 29 


V(Q) = 1.7844 • 10~ 26 and 

(5.45) 

The covariance matrix of 6 is 



. ( 0.520 

5 - ( -0.514 

-°.5!4 1 10 _ K 

0.512 J 

(5.46) 
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Figure 5.5 Input u and output y obtained from observation of a system y* = 
ay *-i + buk-i + e* during a noise-free step-response experiment. 


where the eigenvalues 



(5.47) 


exhibit large differences in magnitude. The corresponding eigenvectors are 
the columns of the matrix 


r 0.7100 0.7042 'j 

{ -0.7042 0.7100 J 


(5.48) 


The eigenvector associated with the small eigenvalue approximately corre¬ 
sponds to the linear combination a + b for which the estimated variance is 


Var{c-r6} = Var{ ( 1 1 } 0} = ( 1 1 ] Var{0} [ J j ~ 0.004-HT 28 (5.49) 

Hence, the sum a + b is indicated to be very accurate, whereas the difference 
of the two estimates is much less accurate. This result is obviously related to 
the experimental condition of step response analysis. The general conclusion 
is, however, that parameter estimates are good when the noise level is low. 
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Time [s] 



Figure 5.6 Input u and output y obtained from observation of a system y* = 
ayi ,-1 + buk-i + C-k during a step-response experiment with a high noise level. 


The result of a similar experiment subject to a high noise level (of = 1) is 
shown in Fig. 5.6. The least-squares parameter estimates are 


fa'J ( 0.8836 'J 

[ 6 J " 1 0.1056 J 


(5.50) 


which provides good estimates despite the high noise level. The loss function 
and the variance estimate are 


V(6) = 484.1 and of = 0.970 


The covariance matrix of 6 is 




( 0.226 —0.2191 io _ 3 

{ -0.219 1.291 J ’ 



(5.51) 


(5.52) 


with eigenvalues 


(5.53) 
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u(k-l) 



u(k-l) 

Figure 5.7 Linear regression and identification problems with offsets due to “out¬ 
liers." Notice the outlier in the lower graph at u w 0.8 and y ~ 19. 


The corresponding eigenvectors are the columns of the matrix 

f 0981 -° 194 1 (5.54) 

{ -0.194 0.981 J 

where the first and second columns correspond to the linear combinations of 
small variance and large variance, respectively. This can be used to determine 
whether the linear combinations of estimated parameters are of small or large 
variance. ■ 


A common flaw is that one residual is very much larger than any of the others. 
Such a residual is often called an outlier and generally causes problems in 
least-squares estimation. 


Example 5.7—Sensitivity of the least-squares method to “outliers” 
Assume 100 observations of inputs Uk and outputs yb have been collected from 
the process 


S : yk = Uk-i + Wk’, 


£{ w k } =0, ‘E{w i w j } = cr 2 Sij 


(5.55) 
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with cr 2 = 1, and assume that the following model is fitted to the data 



(5.56) 


The correct values 0 - [ 0 
mate 


1 may be compared to the least-squares esti- 



(5.57) 


with a 2 = 0.985 and with the estimated covariance matrix 
of (<1> T <I>) _1 = 0.985 


( 0.0526 

-0.0066 'J 

( 0.0518 

-0.0065 'J 

( -0.0066 

0.0010 J 

l -0.0065 

0.0010 J 


(5.58) 


Assume now that a data transmission error provides one abnormal data point 
at u « 0.8 and y « 19 (see Fig. 5.7). The least-squares estimate is now 


( 1.163 1 
( 0.865 J 


with cr 2 = 4.277. The estimated covariance matrix is 


d 2 (<t> T 0)~ 1 


( 0.2248 —0.0281 1 

( -0.0281 0.0043 j 


(5.59) 


(5.60) 


Notice that the estimated variance has increased more than fourfold due to 
the single outlier. Clearly, it is a serious problem that the estimated covari¬ 
ance matrix (5.60) falsely indicates with some confidence that the parameter 
estimates are accurate. ■ 

Hence, the sensitivity of the least-squares estimate to disturbances other than 
white noise is a serious question, entailing careful examination of data in order 
to avoid and deal with possible outliers that would otherwise compromise the 
result. 


5.3 OPTIMAL LINEAR UNBIASED ESTIMATORS 

The least-squares estimator to determine parameters of a model y,- = <pj 6 + e t 
has valuable properties under the assumptions £{e,} = 0 and 'E{e i ey) = 
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of %. Unfortunately, these assumptions are restrictive, and it is valuable to 
identify the class of all linear estimates of the form 


0 = T T< y (5.61) 

where T e R Nx p is a matrix of suitable dimensions and where 9 is a linear 
function of the data vector y. The corresponding parameter error is 

0 = 0 - 0 = T T y - e = (r r o - i pyp )e + r T e (5.62) 

The additional conditions = / and t E{T T e) = 0 must be imposed to 

satisfy the extra condition of being an unbiased estimator 

£{0} = T T <t?9 + T.[T T e} = 9 (5.63) 

Determination of the best possible method involves minimisation of the co- 
variance 


Cov(£) = - e){e - e) T } = ‘E{(T T y - e)(T T y - ef} = t t rt (5.64) 


for R = t E{ee T }. The Lagrangian associated with this constrained optimiza¬ 
tion problem is 

L{T, A) = G t T t RT6 + trA(T r <P -I) (5.65) 

where the first term should be minimized for any 6. The second term con¬ 
sists of a matrix of Lagrange multipliers A multiplying the constraint T Tt y—I 
imposed to satisfy the requirement on unbiased estimates. The partial deriva¬ 
tives of the Lagrangian are 


0 = 
0 = 


dL 

df 

dL 

dA 


2 RT96 t 
T r <S>-I 


+ 0 • A 


(5.66) 


Multiplying the first equation above from the left by <t> T 1? -1 and solving for 
A for any vector 6 gives 


A = -2 (<t> T R- l <P)~ 1 09 T (5.67) 

Substituting this in the first equation gives 


2 R{T - R- 1 <$(<S> T R- l <$>)- i )9e T = 0 


(5.68) 
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which determines T as 


T = R- l Q>{<S> T R~ l <S>Y l (5.69) 

The optimal unbiased estimator 

e = T t !X = (O r J R- 1 <l>)" 1 <I> r i?" 1 ^ (5.70) 


is known as the Markov estimate with the covariance matrix 

Cov{ 0} = (^R- 1 ®)- 1 (5.71) 

This estimator is also known as the best linear unbiased estimate or by the 
acronym BLUE. It is noteworthy that the same optimal estimate is obtained 
by optimizing the loss function 

v(&) = hy - vefR-'w - ®e) (5.72) 

Z 

or by completing the squares of (5.72) 

V{0) = i(y - QdfR-'iy - 00) = ^y T (R- 1 - R- 1 0(<P T R~ 1 ^>y 1 ^ r R~ 1 )y 
Z z* 

+ 1(0 _ (<t> T R- 1 ®y 1 <p T R- 1 y) T (<p T R- 1 <s>)(0 - (oTR-^yWR- 1 ?) 

2 (5.73) 


5.4 LINEAR REGRESSION IN THE FREQUENCY DOMAIN 


Frequency response fitting based on least-squares identification in the complex 
frequency domain is a natural idea. Let the polynomial ratio 


Gfio)) = _ bijiayy 1 + • •• + 6 n 

^ * A(ico) {ico) n + ai(ico ) n - 1 + ■■■ + a n 


(5-74) 


denote a transfer function estimate to be fitted to the experimental data 
G(ia>k ) and known at the frequency points a>k, k = 1,2,...,N. A natural 
goal of optimization is to minimize the error 


min 

a,b 




B{ia)k) 
A(ico k ) 


(5.75) 
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Figure 5,8 Transfer function estimation by least-squares method (upper left), input- 
output ratio (upper right), spectral ratio {lower left), and the Levy method {lower 
right). The nominal transfer function {solid line) is shown in all cases. Transfer 
function estimates obtained from spectral estimates are shown in dotted lines. 


A problem of minimizing (5.75) is that the expressions involved are not linear 
in the parameters and thus this optimal estimation problem cannot be solved 
by ordinary least-squares identification. The Levy method (Fig. 5.8) partially 
resolves this difficulty by minimizing the error function 


minY'' \A(icOk)G(icOk) — B(io)k) | 2 (5.76) 

a,b 


This error function may be formulated as a standard least-squares problem 
by defining the vector 


( ( ico 1 ) n G(ico 1 ) 'j 
( ico 2 ) n G(ico 2 ) 


(5.77) 


(ico N ) n G(ico N ) 
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and the regressor matrix 

' ~(icoi) n ~ l G(icoi) -G{ico{) (ia> i) 71-1 

— \ic 02 ) n ~^G { 10 ) 2 ) *** —G{ico 2 ) ( i(& 2 ) n 1 

O = 

, -{iQ) N ) n - 1 G{ico N ) ••• -G(ico N ) (i6) N ) n ~ 1 • 
and parameter vector 

6 = £ ai ••• a„ 61 ••• 6 n ] 

The least-squares solution minimizing (5.76) is then 

£ = (5.80) 

where <!>* denotes the transpose and complex conjugate of <t>. This method is 
capable of fitting complicated frequency responses and has been widely used, 
although its use is associated with a number difficulties. One possible problem 
is that the method may fail to yield a good fit if data span several decades 
of frequency points — for instance, if the components of both y and d> differ 
widely in magnitude owing to their multiplication by A{i(o), as part of the 
the Levy approach, which entails heavy weighting for large values of (Ok- A 
remedy is to filter y and <E> by some approximation to the filter 1 /A(ico) and 
iterate this procedure. 



♦Least-squares properties of the discrete Fourier transform 

Assume that the continuous variable y{t ) is sampled with y* = y{kh) and 
suppose that these data should be fitted to the series 

N-l 

y k = y(kh) = ^2 6 m exp (ico k mh) (5.81) 

m = 0 

with a set of coefficients { Q m } and a>k = k • (2 n/Nh) for k = 0, 1 ,. .., N - 1 . 
The approximating values yk can be expressed in matrix form as 

y = ®0 (5.82) 

T T 

where y = ^ y 0 yi ... yjv-i j and 6 = ( 0 O 6\ ... 6n-i ) and 

1 1 ... 1 
e io) 0 h gicoih ^ e i(ON-\h 

o = 

e io>o(N-l)k e io)i{N-l)h ^ e io) N -i(N-l)h 


(5.83) 
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A least-squares approach to estimate 8 that applies here is the least-squares 
minimization problem 


N -1 

min V | y k - 

- y k | 2 = minCT - QQYW - <P6) 

(5.84) 

e k = 0 



with the solution 

e = (o'o) -1 ®*^ 

(5.85) 

A simplification gives 

(C-O)-l . 1/ 

(5.86) 

and 

e = io-;r 

(5.87) 

so that 

@m = 

1 N ~ l 

exp(-io) m kh ) 

(5.88) 


*=o 


The Fourier transform may thus be viewed as a least-squares approximation 
to the fitting of data to a sum of complex exponentials. It is also worth noting 
that the power of the sinusoidal component at the frequency (O m is 

1 N - 1 

I 0 ™! 2 = l]v (5-89) 

v *=o 

This expression is proportional to the periodogram expression (4.4). Thus the 
periodogram may be viewed as a least-squares fit of a set of sinusoids to the 
data. 


5.5 LEAST-SQUARES ESTIMATION WITH LINEAR CONSTRAINTS 

It is sometimes desirable to find the 6 that minimizes 

v( 8 ) = i {y - <&e) T R-'ax - <1>6) 

z 

subject to F6 - G = 0 


(5.90) 


Problems of this form appear, for instance, when some measurement is con¬ 
sidered to be of higher quality than ordinary data, or when there are physical 
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reasons for some linear dependence between the parameters. As before it is 
assumed that y = O# + e with error covariance ( L{ee r ) - R. 

The Lagrangian for the constrained optimization problem (5.90) is 


L{9,X) = i(r- QefR- 1 ^ - <P0) + X T {Fd - G) (5.91) 

z 

where A is a vector of Lagrange multipliers. The gradient 

= <D r 2?- 1 <D0 _ d> 7 ’i?- 1 5T + F T X (5.92) 

has an extremum at 

ft = (O r J R- 1 <I>)- 1 (® r i?~ 1 9 r -F r ;i) (5.93) 

By multiplying (5.92) by F T (<I> T R~ 1 <P)~ 1 and by substituting the constraint 
F6 — G, it is possible to solve for the Lagrange multiplier 

X = (F(<D T i2- 1 <D)- 1 F r )- 1 (F(<I> r i2- 1 <I>)- 1 <I> r i2- 1 (r - G) (5.94) 


The estimate (5.93) contains one term that is identical to the unbiased optimal 
estimate (5.70) and a second correction term that is proportional to X. 


Example 5.8—Least-squares estimation with specified static gain 
Consider the least-squares identification problem in Example 5.4 and assume 
that it is desirable to find the least-squares estimate with a specified static 
gain 


b 

1 - 


a 



(5.95) 


Using the data from Example 5.4, the estimate is then modified to 


6 = 



(5.96) 


The loss function is V(8) = 500.07 and it is straightforward to verify that this 
estimate satisfies the linear constraint (5.95). ■ 
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Figure 5.9 Orthogonal projection where y is the projection of y on the subspace 
spanned by x\, xi. 


5.6 *A GEOMETRICAL INTERPRETATION 

The mathematical solution to the least-squares problem offers some interest¬ 
ing geometrical interpretation. The least-squares estimate Xn can be regarded 
as the projection of Xn on the subspace spanned by the columns of <J>n with 
the projection matrix 

Pn = ®n( < &n < &n)- 1 ®n, which satisfies {(5.97) 

When Pn multiplies Xn, it provides the projection Xn - < &N( < $ > N < &N)~ 1< &N9 r N 
which is the linear combination of columns of <I>jv that is closest to Xn (see 
Fig. 5.9). The corresponding minimum of the least-squares criterion is 

V(0 N ) = \e T e = \{XnXn -X£?n) (5-98) 

which indicates that a poorly fitted Xn tends to be small in magnitude. The 
relationship between the geometrical interpretation and the optimal estimate 
can be formulated as the following lemma. 

Lemma 5.1—Principle of orthogonality 

Let Y be a normed space, y e Y, and let X be a subspace of Y. Then there is 
a y e X minimizing 


min(x - y) T (x - y) = (y - y) T (y - y) 


(5.99) 
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if and only if 

{y-y) T x = 0, VxeX (5.100) 

Proof: Suppose that there exists a x such that (y - y) T x = a 0. Then for 
any scalar X 

(y - y + Xx) T {y - y + Xx) = (y - y) T (y - y) + X 2 x T x + 2 Xa (5.101) 

By choosing X = -a/x T x we find that (y-y + Xx) T x = 0 and 

(y - y + lx) T (y -y + Xx)<{y- y) T (y - y) (5.102) 

which shows the orthogonal property (5.100) to be a prerequisite of optimality. 
Now suppose that (y — y) T x = 0, for any x e X. Then 


(y-y + Xx) T (y-y + Xx) = (y-y) T (y-y) + X 2 x T x > (y-y) T (y-y) (5.103) 

which proves the optimality. ■ 

An application of the principle of orthogonality to the least-squares parameter 
estimation problem gives the orthogonality condition = 0 (see Fig.5.9). 
Using the principle of orthogonality it is also possible to formulate a prob¬ 
lem that is equivalent to the least-squares problem. Hence, find the optimal 
that minimizes ‘Xn — &N&N = £ subject to the constraint <E>^e = 0 — 
i.e., prediction errors and regressors are required to be uncorrelated (or or¬ 
thogonal). Combination of the two equations yields the normal equations 
= QJjXn- Moreover, these two equations can be formulated as the 
linear system of equations 

[k £)(£)-(£) 

The formulation (5.104) as a symmetric indefinite linear system is known as 
the augmented system method of solving least-squares estimation problems. 
The matrix in (5.104) is called the augmented system matrix , and there are 
special numerical methods developed to solve such systems of linear equations, 
see (Bjorck, 1991). Another approach is to use the pseudo-inverse based on 
the singular value decomposition, which provides the minimizer ( e T 9 dj^) with 
the smallest 2-norm also in the case of an augmented system matrix with 
rank-deficit (see Appendix A). 
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5.7 ‘MULTIVARIABLE SYSTEM IDENTIFICATION 

Consider a multi-input, multi-output system 

S: A(z~ 1 )Y(z) = B(z~ 1 )U(z), detA^" 1 ) ^ 0 (5.105) 


with p inputs u* e R p and m outputs y* e R m and polynomial matrices 


A(z x ) = I mxm + A\z 1 + • • • + A n z n , 
B^z- 1 ) = Biz- 1 + ■■■ + B n z~ n 


Ai,....An eR mxm 
B lt ...,B n eR m * p 


(5.106) 


A characteristic problem for multivariable linear systems is that, in general, 
there is no unique factorization (A(z _1 ), B(z -1 )) that corresponds to a given 
transfer function if(z -1 ) - A -1 (z -1 )B(z -1 ). For instance, the following two 
factorizations 


1 -z~ l + z~ 2 


„-i 


—z 1 + z 2 ) ( 

l-z- ) rW '( 


.-1 


r -l 


1-z" 1 


0 

1-z' 1 


] Y(z) = 
) 


Z _1 + z~ 2 



(5.107) 


represent the same transfer function despite their different parametrizations. 
In fact, the second factorization is obtained from the first by multiplication 
of the polynomial matrices A(z -1 ) and B(z _1 ) from the left by the polynomial 
matrix 

Q(z~ 1 )= Z 1 j , with detQ(z _1 ) = 1 (5.108) 

Hence, given a multivariable transfer function H{z~ l ) it is suitable to define 
an equivalence class of factorizations (Q(z- 1 )A(z _1 ), Q(z -1 )B(z -1 )) of H{z~ l ) 
for any stable, causal, and invertible polynomial matrix Q{z~ l ). For any mem¬ 
ber of this equivalence class can be found the transfer function 


(Q(z- 1 )Afz" 1 ))- 1 Q(z- 1 )B(z- 1 ) = A- 1 (z- 1 )B(z" 1 ) = H{z~ l ) (5.109) 


Accordingly, any member of this equivalence class can be used to describe 
cross-coupling, delay, and other transfer function properties. An important 
conclusion is that assumptions on unique parametrizations are artificial as¬ 
sumptions that should be avoided unless there is an explicit a priori motiva¬ 
tion. However, as for practical reasons it is desirable to use a finite number of 
well-defined parameters, it is often suitable to choose the parameter set with 
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the smallest 2-norm. For the purpose of least-squares identification, then, it 
is suitable to organize model and data according to 


Vk = • 

-Aiyk-i 

- Kyk- 

n + BiUk-l + * * * + B n Uk- n 

, yk e R m 

$k = 

( -yLi 

1 

! 

3 

u k-l ••• • 

<t> k e R< m+ P) 

6 = 

(A, ... 

A n Bi 

N T 

• • B n j , 

6 e R n ( m+ P) xm 


(5.110) 


which suggests the linear regression model 





(y{ 


\<t> I ' 

M : 

!Xn = ®n0. 

with Jn = 

yl 

, and <E>at = 

<!>I 




■ yjf • 




The normal equations of the associated least-squares estimation of 6 will, as 
a result of the non-uniqueness of parameters, in general exhibit rank deficit. 
It is therefore natural to apply the least-squares solution 


On = (5-112) 

where (0j)0^r) + denotes the matrix pseudo-inverse of see Appendix A. 

The associated least-squares estimate then obtained has the smallest 2-norm 
of all possible minimizers of the least-squares criterion. 

Example 5.9—Multivariable system identification 
Consider the multivariable system 


r (0.5 0.4') (11) 

•S'- yk = I 04 05 j' y *- 1+ ll _1 I Uk ~ u u k ,y k eR 2 (5.113) 

with data according to Fig. 5.10, which exhibit obvious cross-coupling proper 
ties. The estimate for model order n = 1 is 


A\ ' 


-0.5 -0.4 'j 
-0.4 -0.5 

B T 


1.0 

1.0 



. 1.0 

-1.0 > 



ll^/vlb = 1.676 
!|0 w||f = 2.195 


(5.114) 
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Figure 5.10 A multi-input, multi-output system with two inputs in the vector Uk 
and two outputs in the vector y* from Example 5.9. 


whereas an estimate for n = 2 is 


Al ' 


-0.5000 

-0.2827 

-0.0469 

-0.4000 

-0.3534 

-0.0587 

Al 


-0.0587 

-0.0733 



1.0000 

1.0000 

bZ , 


1.0000 

- 1.0000 



0.1173 

0.1466 



. -0.1173 

-0.1466 . 


r ||0*|| 2 = 1.641 
l \\e N \\F = 2.168 


(5.115) 


Notice the interesting property that the higher-order estimate for n = 2 gives 
a result with a lower 2-norm of the estimate than the estimate for n = 1. ■ 
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5.8 CONCLUDING REMARKS 

Two important prerequisites should be fulfilled for the application of the least- 
squares method without complications arising. These conditions are that 
OjO.v is invertible and that the noise is uncorrelated with the regressors 
so that £{<*>£<?} = 0. It is often suitable to formulate this invertibility 
condition as a condition of the experimental procedure called excitation , and 
which can be measured in terms of the matrix rank, determinant, or singular 
values of For instance, assuming the excitation to be characterized as 

rank(Ojy<E>Ar) = m means that the variability of the input is such that at most 
m parameters can be uniquely determined as the result of a particular exper¬ 
iment. Thus, when fitting a model with p parameters where p < /n, a low 
excitation results in an underdetermined set of normal equations. However, as 
the excitation properties of the experiment providing data for the least-squares 
estimation problem are straightforward to compute, such a check should be¬ 
come part of the experimental procedure. Failure to fulfill the condition of 
invertibility means that there is no unique solution to the least-squares prob¬ 
lem. However, the underdetermined (or rank-deficient) least-squares problem 
may still be solved in a meaningful way, using the matrix pseudo-inverse (see 
Appendix A). 

Unfortunately, the second assumption of uncorrelated regressors and distur¬ 
bances with £{<3>jye} = 0 is difficult to check either a priori or a posteriori , a 
circumstance which constitutes one of the major problems with least-squares 
identification (see Example 5.5). Violation of this correlation assumption may 
lead to inconsistent parameter estimates with a bias (= .This 

is an important problem which is sometimes successfully resolved by making 
some reparametrization and which is further addressed in Chapter 6. 

Linear estimates are attractive because they are simple to use in calculations 
owing to the availability of good software for linear algebra. Analytical ex¬ 
pressions are thus possible for the unique optimal estimates as well as for 
covariance estimates. Methods of solving the normal equations include stan¬ 
dard Cholesky-type linear equation solvers or the QR-factorization (see Ap¬ 
pendix A) implemented with Householder orthogonalization, modified Gram- 
Schmidt methods, or Givens orthogonalization. Another important approach 
is to rely on the singular value decomposition and associated matrix pseudo¬ 
inverses which permit solution of the normal equations or the augmented 
system (5.104); see (Golub and Van Loan, 1989), for a systematic account of 
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numerical issues. In addition, a number of special methods have been de¬ 
veloped that exploit certain matrix structures of the normal equations, but 
these will be presented in Chapter 6 and recursive least-squares methods in 
Chapter 11. 

Finally, suffice it to mention that it is important to check the matrix condition 
number of the normal equations as well as the least-squares residual sum, and 
that the occurrence of large numbers should alert one of impending numerical 
problems. 


5.9 HISTORICAL REMARKS AND REFERENCES 


The application of the least-squares method dates back to the beginning of 
the nineteenth century and the matematician C.F. Gauss, who developed a 
method of fitting an elliptical planetary orbit to few observations. He suc¬ 
cessfully used the method to predict the orbit of the asteroid Ceres, which 
was lost shortly after its discovery by J. Piazzi in 1801. The medical doctor 
and amateur astronomer W. Olbers succeeded in finding the lost planetoid by 
means of the orbit predictions provided by Gauss. Both Gauss’s least-squares 
method and his results regarding error distribution, now known as the normal 
distribution, were published in 

- C.F. Gauss, Theoria motus corporum coelestium in sectionibus conicis 
solem ambientium. 1809. 


A pioneering modern book is 

- P. Whittle, Prediction and Regulation by Linear Least Squares Methods. 
New York: Van Nostrand, 1963. 


Several computation methods that apply to solution of least-squares problems 
are to be found in 


G.H. Golub and G.F. Van Loan, Matrix Computations, 2d ed. Baltimore: 
The Johns Hopkins University Press, 1989. 

A. Bjoeck, “Pivoting and stability in the augmented system method.” 
Report Lith-Mat-R-1991-30, Dept, of Mathematics, Linkoping University, 
Sweden. 


- J.R. Bunch and L. Kaufman, “Some stable methods for calculating inertia 
and solving symmetric linear systems.” Mathematics of Computation, Vol. 
31, 1977, pp. 162-179. 

Least-squares estimation in the frequency domain was introduced in 
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- E.C. Levy, “Complex curve fitting.” IRE Trans. Autom. Control, AC-4, 
1959, pp.37—43. 


5.10 EXERCISES 

5.1 Show that the bias of the least-squares estimate is 

5.2 Calculate the least-squares parameter estimate using the matrix pseudo¬ 

inverse for subsets of data from Example 5.3 with rank-deficient 0^0//, 
e.g., ZI 2 = (1, 2) T and 92 = (6,17) r . Verify by computation that ^ 

\\d \\2 for the least-squares estimate as obtained for N=2. 

5.3 Assume that the noise sequence {e*} x consists of independent normally 
distributed components e* e fV(0, a 2 ). Show that the least-squares es¬ 
timate On of parameters of the model y* = <j>l 6 + e* is asymptotically 
normally distributed. 

5.4 Show how the normal equations should be modified in order to solve the 
weighted least-squares problem 

minCTw - 0N0) T W(y N - 0 N 0) (5.116) 

e 

where W is a suitable weighting matrix. 

5.5 Organize the weighted least-squares equations similar to Eq. (5.104). 

5.6 Assume that the error components {e*}£Li have zero mean and covari¬ 
ance matrix £{e,ej} = RSy. What is the residual sum of the least- 
squares criterion for the optimal parameter estimate? 

5.7 Show that it is possible to complete the squares of the Lagrangian (5.65) 

in a manner similar to (5.21) provided that is invertible. What 

is the residual sum for the optimal constrained parameter estimate? 
How does this compare to the unconstrained estimate? 

5.8 Show that the inverse of the augmented system matrix (5.104) is 

r/ ) _1 _ (/ - <M®W) _1 *]v 

U& o J { ( 0 ^ n ) - i 0 t J ’ 

when 0^-0 n is invertible. 

5.9 Verify that the estimate (5.70) is optimal with respect to the criterion 
(5.72). 
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5.10 Formulate the least-squares criterion for multivariable data, cf. Section 
5.7. Verify that minimization of this criterion yields the standard normal 
equations. 

5.11 Assume that input-output data {«*} and {y*} are observed from the 
system 

S : yk + ayk-i = buk -1 + u>k + cwk-x (5.118) 

where {u*} and {w k } are independent white-noise sequences with vari¬ 
ances <r 2 = cr 2 = a 2 . Assume that the model 

‘M. yk + ayk-i = buk-x (5.119) 

is fitted to data by means of least-squares identification. Show that the 
prediction error variance is minimized for 

6= ^ a b) T = (a-c<J 2 /£{y|l fe] (5.120) 

and that the prediction error variance is smaller than that for a = a and 

b = b. 

5.12 An impulse response test has given the result 

Time t 0+ 0.2 0.4 0.6 0.8 

Impulse response h{t) 3.4 2.3 1.7 1.2 0.9 

a. Determine a least-squares estimate of the parameters K and r. 

b. Determine an estimate of the parameters K and r which minimizes the 
squared relative error. 

c. Determine an estimate of the parameters K and x by means of a least- 
squares fit of the transformed model 

logM0 = log# - ^ = ( 1 -0 (^f ) (5-121) 

s 




Identification of 
Time-Series Models 


6.1 INTRODUCTION 

The approaches to modeling presented thus far have been confined to black¬ 
box models of linear systems and linear regression models 

yk=<PlO + v k ( 6 . 1 ) 

Linear regression identification consists in reformulating various estimation 
and prediction problems in the form (5.10), which for suitable definitions of 
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the observations y *, regressors «pk, and disturbance Vk also applies directly to 
the model 

M z_1 )yk = B(z~ l )u k + v k (6.2) 

where the discrete time series { yk] and {u*} provide data. The term time 
series here simply means a sequence of data ordered in time. 

We now change the focus to models of the type (6.2), including their extension 
to more sophisticated disturbance models, which we call time-series modeling 
or time-series analysis. The corresponding identification problem consists in 
determining the model structure and parametric estimation of the polynomi¬ 
als involved. In many cases such parameters can be identified by using a 
linear regression approach. The least-squares solutions to the linear regres¬ 
sion problems have excellent properties in cases where the disturbances at 
different times are uncorrelated. However, the presence of outliers or noise 
with non-zero mean or otherwise correlated disturbances may give rise to 
characteristic systematic errors and bias in the parameter estimates (see Ex¬ 
ample 5.7). The systematic errors in cases with more complicated spectral 
disturbance characteristics therefore constitute a significant problem. In par¬ 
ticular, these systematic errors are unsatisfactory in the many cases where 
it is desirable to perform time-series modeling as well as careful spectrum 
analysis. Conversely, the presence of correlated disturbances of composition 
other than white noise renders bias reduction necessary. Efforts to solve such 
problems have given rise to several extensions of linear regression models. In 
particular, maximum-likelihood methods applied to estimation of autoregres¬ 
sive moving average models are of central importance in this context. 


6.2 MODEL STRUCTURES 


Time-series analysis is concerned with functions of time that exhibit random 
properties. In many situations it is of interest to consider a vector 


Xk 




(6.3) 


of time series, in which case {x/,} is said to be a multivariate time series. 
Sometimes the time series may be a function of several temporal and spatial 
variables, in which case it is said to be a multidimensional time series. 

Model classification of time-series models distinguishes, for reasons of com¬ 
plexity, between single-input, single-output models and multi-input, multi¬ 
output models; linear and nonlinear models; and deterministic and stochastic 
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models. As in many situations a deterministic model is inadequate to describe 
a system, it is natural in such cases to consider the system outputs as being 
a realization of a stochastic process that can only be described by statistical 
laws. Identification of time-series models offers several statistical approaches 
to model fitting in addition to the deterministic criteria used in least-squares 
identification. There are also several specialized topics such as detection of 
signals consisting of sinusoids confounded with noise (common in signal proc¬ 
essing), and fitting of stochastic models with certain probability distribution 
functions. As the ultimate test of a model is, of course, its adequacy for a 
specific purpose, the model structure is usually a compromise between sim¬ 
plicity and the power to predict the observed behavior from given inputs to 
the system. 

There are at least three important categories of times-series models: 
o Difference equations and ARMAX models 

o Transfer function models 

o State-space models 

In this context, we usually regard the true parameters as constant but un¬ 
known. In the case of time-varying parameters there is an additional problem 
in estimating instantaneous values of the parameters, a difficulty which calls 
for special methods. Also aggregate parameters or lumped parameters where 
two or more physical parameters aggregate and form a new parameter 
cf, Example 1.1—with transfer function time constant parameters RC, LC 
arising from electronic parameters R, C, L. Moreover, models arising from 
physical modeling may exhibit distributed parameters that originate from ap¬ 
proximating partial differential equations by means of a discretized model or 
an ordinary differential equation. 

Let us consider some important classes of linear discrete-time systems. 

ARMAX models and difference equations 

The ARMAX models (autoregressive moving average with exogenous input) 
constitute an important class of difference equations on the form 

A{z~ l )y k = z~ d B{z~ l )uk + C{z~ l )w k ( 6 - 4 ) 

where d is a time delay and A, B, C are polynomials 

A(z~ x ) - 1 + aiz -1 + - • • + a nA z~ nA 
i B(z~ x ) = 6 o + biz' 1 + • • • + b n „z~ nB 
C(z _X ) = 1 + CiZ -1 + • - • + C nc Z~ nc 


(6.5) 
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with the unknown parameters 

^ ®i ••• ... h n/) Ci ... c nc a 2 w ) T (6.6) 

Notice that the only special case of the ARMAX model that admits a reformu¬ 
lation to the linear regression model is the controlled autoregressive model 
(ARX) 

A{z~ l )yk = z- d B{z~ l )u k + w k (6.7) 

where w k is white noise. Hence, there is no immediate reformulation of the 
general ARMAX model that results in a linear regression model y = </>0 + v 
unless the disturbances { w k } are available to measurement. 

The ARMAX models include several interesting special cases such as the au¬ 
toregressive (AR) model 

Mz~ l )yk = w k (6.8) 

which is effective to model harmonics confounded with noise. The moving 
average (MA) model 

y k = C(z~ 1 )w k (6.9) 

is another type that is popular in signal processing as a basis for filter design 
(FIR) and the identification of truncated impulse responses. The ARMA model 

A(z~ l )y k = C(z~ x )w k (6.10) 

is effective in modeling disturbance spectra with spectral peaks as well as 
zeros and is therefore used in model-based spectrum analysis. 

The prediction error methods ( i.e ., methods to predict y based on previous data 
and the identified model) are natural to use in the context of time-series mod¬ 
els of prediction and filtering. It is sometimes argued that all model-fitting 
methods are prediction error methods, in the sense that they compare the 
behavior of a model and the model-based prediction with experimental data 
and try to adjust the model to obtain a better fit. In addition, the technical 
solutions to many estimation problems are based on similar numerical opti¬ 
mization methods. However, prediction error methods in a more restricted 
and adequate sense comprise a class of identification methods based on opti¬ 
mization criteria of the type 


N-r N-t 

rn ? n£ { E(y*+r|*(^) - J'A +r ) 2 } = min“£{ f*-n*(^)l ( 6 - n ) 

0 k=i e k=i 
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Figure 6.1 Prediction error method or “equation error” method based on the one- 
step-ahead prediction error £* = y*|*-i — y* • 

These predictors minimize the variance of the prediction t steps ahead of the 
output y where the prediction is based on present data; see Fig. 6.1. 

Example 6.1—Prediction error for a first-order model 

Consider the first-order mode! 

}’k = -ayk -1 + bu k ^i + w k + cw k -i, £{ 10 *} =0, ‘E{wfj = cr 2 , (6.12) 

If we consider any predictor yk\k-i of yk based upon data up to sample k — 1, 
then we can estimate its variance by 

I 

£{ {yk ~ i'Aii-i) 2 } = £{ (~ay k -i + bu k _ i + cw k - 1 - i^A-i) 2 } + £{ {w k f \ > o 2 

(6 ' 13) 

where the two terms depend on data up to time ( k — 1) and data at time 
k , respectively As the first term is positive definite, we may conclude that 
Eq. (6.13) provides a lower bound on the prediction error variance, i.e,, the 
achievable accuracy of any predictor yk\k-i • 

As the noise { Wk) is not measured we use the optimal predictor of y*(0) based 
on data {yk} and { Uk }, and the parameter vector 6 is 


yk\k-i(0) = -cy*-i|*.-2(0) + (c - a)y k -i + bu k -i (6.14) 

The resulting prediction error for |c| < 1 is determined from the recursive 
equation 

e k {9) + ce k - 1 (<?) = yu + ayk-i ~ buk-i ( 6 . 15 ) 
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Hence, the prediction error approach to identification is to determine 6 so that 
the prediction error variance is minimized. ■ 


Transfer function models 

A transfer function model that allows for deterministic and stochastic model¬ 
ing is 

y k + H v (z)vk *£ { V[ vj } = aLpSy 

where {u*} is discrete-time noise sequence, and {u*}, {y^} are input and 
output data, respectively. 

There are several algorithmic reasons to factorize transfer functions into nu¬ 
merator and denominator polynomials. In the context of identification, there 
are two popular transfer function models 


yk = 


yn = 


B&- 1 ) 

F^z- 1 ) 

B^ 1 ) 

Biz- 1 ) 


Uk + Vk, 


Uk + 


Cjz- 1 ) 

D(z-i) 


w k , 


Output error model 
Box-Jenkins model 


( 6 . 16 ) 


where the first model {output error model) contains no assumption on the spec¬ 
trum of the disturbance sequence {, whereas the second model (sometimes 
called a Box-Jenkins model) contains a noise model with the white-noise se¬ 
quence {wk} filtered through the transfer function C/D . An attractive prop¬ 
erty of the Box-Jenkins model is that it yields separate descriptions of the 
input-output relationship between u and y and the noise spectrum model de¬ 
scribed by C/D and { Wk ). This is obviously an advantage as compared to AR- 
MAX models, which contain one description only with the same A-polynomial 
for both the poles of the transfer function and for spectral peaks of the noise 
model. 

A description to treat transfer function models as well as difference equations 

(6-17) 

where A,B,C,D,F are polynomials in z -1 of degrees ftA. ”s, «c. «*■, re¬ 

spectively. The model set is described by the parameters 


°i • .-c-iia .. .b na 


fl-fr 


if 


(6.18) 
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Figure 6.2 Output error identification where the “output error" provides a com¬ 
parison between the system and model outputs. 

The output error model is a special case of the general model (6.17) with 
A = C = D = 1, i.e., 

y k = + Wk ( 6 - 19) 

With the output error method the identification effort is directed toward esti¬ 
mation of parameters of the F-, and B —polynomials 


N-l 

“i? J2 \ yk 

B - F k=o 


B&- 1 ) 

F(a-i) 


u k | 2 


( 6 . 20 ) 


The criterion (6.20) is not exactly the same as a prediction error criterion 
(see Figs. 6.1 and 6.2). A major difference from prediction error methods is 
that the model response of the output error method, i.e., ( B/F)u of (6.20), 
is not a function of the value of y some steps back. Actually it is open to 
question whether the term “prediction error method” is justified for model¬ 
fitting methods of the type (6.20). Some of the differences between output 
error models and prediction error models appear in the following example. 


Example 6.2—Difference between output error and prediction error 
The difference in model equations between equation error (prediction error) 
analysis and output error analysis can be demonstrated by the two predictors 


y k = a y k-i + bu-k- 1 , Output error method 
y k = ay *-1 + buk- 1 , Prediction error method 


( 6 . 21 ) 
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0.6 0.8 1 
Parameter estimate of a 



0.6 0.8 1 
Parameter estimate of a 


Figure 6.3 Loss functions with level surfaces of the output error method and the 
prediction error method as applied in Example 6.2. The true parameters of the first 
order system are a = 0.9 and b = -0.1. Notice the local minima. 


The output error method thus relies more on the accuracy of future output 
modeling. The output error loss function (6.20) and the prediction error loss 
function (6.11) are both least-squares problems, although with fundamental 
differences inasmuch as the output error identification is a nonlinear estima¬ 
tion problem, whereas the equation error identification is a linear estimation 
problem. The two methods do have different properties with respect to param¬ 
eter optimization where the output error method may exhibit local minima; 
cf Fig. 6.3. H 

In the following example we demonstrate an implementation of an output 
error identification algorithm. 

Example 6.3—An algorithm for output error identification 
An iterative solution to the problem of output error identification starts by 
a PPbd n & standard least-squares identification to find an initial estimate of F 
and B from the model 



F(z l )yk = B[z i'jUk + v k 


( 6 . 22 ) 
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Figure 6.4 Error models for identification of time-series models. The block dia¬ 
grams show input error (upper left), output error (upper right), prediction error or 
equation error (lower left), and the use of prewhitening filters (lower right). 


A second step is to filter the data according to 

r - 
k F( 


F(z~ 1 ) 


yk 


Uu = 


F(z~') 


Uk 


(6.23) 


with a subsequent least-squares estimation of F and B from the model 

: Ffz" 1 )^ = B(z~ 1 )4 + v k (6.24) 


The filtering and estimation steps are continued until the residual noise se¬ 
quence {yk — y*} is minimized in the least-squares sense, and until the pa¬ 
rameter estimation has converged. ■ 

The procedure of regressor filtering is often called prewhitening with a fil¬ 
ter F(z~ 1 } called prewhitening filter (see Fig. 6.4). The term is justified in 
cases where the filtering results in white-noise residuals. A standard use 
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of prewhitening allows effective reduction of bias caused by correlated noise, 
and this principle of prewhitening is incorporated into several estimation al¬ 
gorithms. An example is found in Eq. (6.23) where the filters F(z~ l ) act as 
prewhitening filters. 

A modified form of output error identification, known as input error identifi¬ 
cation , models the system input { Uk } (see Fig. 6.4). The method consists of 
fitting an inverse model by matching the modeled input to data with the sys¬ 
tem output {y*} as model input. The corresponding input error u^ = u^ — Uk 
thus yields the misfit between the modeled input and data. Input error identi¬ 
fication is relevant for problems of input estimation, deconvolution, and other 
similar filtering and signal restoration problems. A problem in this context is 
how to handle the causality problems inherent in inverse modeling. 


6.3 MAXIMUM-LIKELiHOOD IDENTIFICATION 


In maximum-likelihood (ML) identification we select the estimate which ren¬ 
ders the given observations y most probable. This is accomplished by maxi¬ 
mizing the likelihood function p{<y\0). 

max/?Cr| 0 ) = pQr\0) (6.25) 

6 


which yields the estimate 6-6 that maximizes pity |#). (A comparison be¬ 
tween loss functions of maximum-likelihood method and the least-squares 
method is shown in Fig. 6.5.) 

Example 6.4—Maximum-likelihood estimation and Markov estimates 
Assume that <y = ®6 + v with known mean =0 and known covari¬ 

ance t E[vv T } — X y . If v is assumed to be normally distributed and it has N 
elements, its probability density function is 

P( v ) = ((2^) W detZ u )- 1/2 exp(-iu r Z t : 1 i;) (6.26) 

Under the assumption y = <t>6 + v we have 

p(v) = ((2k) n detZ, v )~ 1/2 exp (-^(y - <P0) T Z; 1 (y - <&<?)) 


(6.27) 



114 


Chap. 6 Identification of time-series models 



0.6 0.8 1 
Parameter estimate of a 


Least-squares loss function 




Parameter estimate of a 


Figure 6.5 Loss functions with level surfaces of the maximum-likelihood method 
and the least-squares method. 


which we call a likelihood function . We may choose to consider Eq. (6.27) as 
a function of a parameter vector 6 and prediction errors e{8) = *y - It is 
practical to take the logarithm of this function, and then we have 


logL(0) = \ogp{e\6) = -ilog((2^) N det2: ( ,)-i(r- < I>0) T ^ 1 er-^^) (6-28) 


Here we expect the log-likelihood function to have a maximum at 6 = 6 and 
e = v. Hence, the maximum likelihood estimator for a model linear in pa¬ 
rameters and with normally distributed white noise is identical to the Markov 
estimator; cf Eq, (5.70). In the case of disturbance with the covariance ma¬ 
trix Z[, = <r 2 /, it follows that the maximum likelihood estimator reduces to 
the least-squares estimator. ■ 

Remark: If it is required to optimize, say, the empirical prediction error co- 
variance or the parameter error covariance by means of ordinary optimization 
methods, it is necessary to formulate a mapping of the covariance matrix to 
some scalar function with appropriate properties of optimality. Suitable opti- 
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mization criteria are, for instance, 


jc/i(#) ^i(Q) logdetQ where Q{6) - ^-ze{0) T e(6) (6.29) 

\j 2 (0) = 4(Q) = tr(WQ) j N \ \ > k ) 

where W = > 0 is some weighting matrix. Both functions £\, i 2 have the 

property 

iliQ) > ti(Qo) if Q > Qo e R™ 

(6.30) 

l 2 (Q) 2 t 2 (Qo) ifQ > Qo eR nxn 

The proof is called for in Exercise 6.9. ■ 

There is actually a lower limit for the covariance of an unbiased estimator 
obtained by maximum-likelihood identification. This is known as the Cramer- 
Rao lower bound. 

Theorem 6.1—The Cramer-Rao lower bound 

Let be observations of a stochastic variable, the distribution of which de¬ 
pends on an unknown vector 6. Let H*y*0) denote the likelihood function, 
and let 6 - 0(*Y) be an arbitrary unbiased estimate of 6. Then 


Com > (6.SX) 


(A proof is to be found in Appendix B.) 


The matrix defining the Cramer-Rao lower bound is called the Fisher infor¬ 
mation matrix , and an estimate which achieves the lower bound is referred 
to as efficient. 


Autoregressive moving average models with exogenous input (ARMAX) con¬ 
stitute a model set general enough to describe colored noise that cannot be 
described by the ARX models of the type (6.7). Similar to Eq. (6.4) we consider 
ARMAX models of the type 


A(z~ l )y k = z- d B(z~ l )u k + C(z~ l )v k (6.32) 

where the noise covariance matrix E v = ‘L[vv r ) is now assumed to be un¬ 
known. Formulation of a maximum-likelihood problem involves the calcula¬ 
tion of a likelihood function L{6). Considering the case of normally distributed 
noise, we have 


L(0) = 


_1_ 

(27i) N ' 2 (detE u ) 1 ^ 2 


exp (-^ r (0)£;^(0)) 


(6.33) 
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Figure 6.6 Block diagram showing the approximate maximum-likelihood estima¬ 
tion of an ARMAX model with a time delay of d samples. Minimization of the resid¬ 
uals and consequently the prediction error is accomplished by adjustment of A, B, 
and C. 


log L{6) = - i log((2;r) w det £„) - ^e T (0)Z V 1 e(0) (6.34) 

with £ containing the components e* = yk ~ <t>\& an d 

yk = ~aiy k -1 - a nA yk-n A + 

+ biUk-d-i + ••• + b m Uk-d-n B + (6.35) 

+ V k + Cj.Oa._i + ••• + C nc V k -n c = <Pk 6 + u k 


with 


*Pk ~ ^ yk— I--- y k — n A Uk-d-l • • • Uk—d—riB 
e= ^ ai . ..a nA b 1 ...b nB ci...c„ c j 


Vk-l — Vk-nc 


) 


T 


(6.36) 
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where it is a problem that the regressor components Vk -\,.... Vk-„ c are not 
known. Another problem is that this optimization criterion is a function of 6 
and £„ and is thus riot known. In the absence of the desired parameters, it is 
therefore only feasible to make approximate solutions by finding successively 
better estimates of the covariance matrix L u and the parameters 6 through 
some iterative procedure (cf. Fig. 6.6). In the important special case with 
normally distributed white noise with Z 0 = <7*1 and of unknown, we replace 
Eq. (6.34) by the empirical likelihood function 


logL(0,df) = -^log(2*-) - -ij^f|(^) - ^logof 


2 ^ 2 *.i 

-~log(2/r) - y log of 


(6.37) 


v r here 

Vn0) = ^£>1(0) = h T (0)e{0) (6.38) 

* k = l 

The gradient and the second-order derivatives of lo gL(0) determine the ex¬ 
trema of log L according to the equations 


0 = ^L\ogL{e,<rl) = ~^W N ( 0 ) 

with the solution 

% = §jV N ($) 

VVM = 0 

A numerical solution to the problem 

VV N (0) = V' N (d) = 0 


(6.39) 


(6.40) 


(6.41) 


can be obtained as an iterative procedure via the Newton(-Raphson) method 

0 U+1) = 0 (i) - a,-(V"(0 w ))- 1 V"(0 w ) (6.42) 

where a, denotes the step length to choose and (i) the iteration order. The ini¬ 
tial estimate is usually chosen as a least-squares estimate. The elements 
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of this computation can be given the form 
V n (9) 

¥k{0) 

W n {6) = V' N (§) 

V 2 V n {6) = VZifi) 

The Newton method is a good numerical procedure with “quadratic” conver¬ 
gence properties; see Appendix C. The method must be modified, however, 
when V n is not invertible, which may constitute a problem at some distance 
away from the solution. A similar problem arises when the model set is not 
uniquely parametrized or when a poor choice of optimization criterion has 
been made. It is therefore an advantage to start the iterative algorithm with 
good initial values obtained from some other numerical algorithm or from a 
least-squares estimate. For the same reason it is difficult to ensure global 
convergence by using the Newton method with arbitrary initial conditions. 

Example 6*5—A comparison of LS- and ML-identification 
Consider data generated by the system in Example 6.1 

S : yk = a,y k -1 + bu k -i + w k + cw k - i (6.44) 

with *E{wk) - 0 and £{ WiWj ) = o 2 8ij with N = 1000 samples collected 
(see Fig. 6.7). Application of least-squares (LS) identification and maximum- 
likelihood (ML) identification yielded the following results: 

a b c V a 2 \\0\\ 

LS identification 0.9493 0.0398 - 723 1.446 0.0778 

ML identification 0.8992 0.0857 0.7072 501 1.008 0.0143 

where the value of ||0|| includes the errors of a and b only. These results 
suggest maximum-likelihood identification performs better than least-squares 
identification in the case of colored noise. ■ 

Example 6.6—Pseudolinear regression 

Assume that data {y^}, { Uk } are generated from the system 


N 


A k=i 

-Ve*(£) 

-^2e k (O)yrk0) 

*=1 

N 


(6.43) 


N 


= ^¥k{0)y/l{e) + ^e*(0)V 2 eJ(0) 


A = 1 


k=l 


S : A(z x )yk = B(z x ) u k + C(z u ,w k 


(6.45) 
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Time [s] Time [s] 


Figure 6.7 LS- and ML-identification of an object described by the ARMAX model 
yk + 0.9y*_i = + e* + 0.7e*_i. The lower graphs (left and right) show the 

squared residuals e\ for the least-squares (LS) and maximum-likelihood estimation 
(ML) methods, respectively. 

where {u;*} is a white-noise sequence. An alternative to the iterative nu¬ 
merical optimization (6.43) is to estimate high-order polynomials A and B by 
means of least-squares identification which, in turn, allows the computation of 
{£*}. As the model order is high, it may be assumed that the computed resid¬ 
ual sequence {£*} yields a good approximation of the white-noise sequence 
[wk ]. A second step of least-squares identification can then be applied, using 
the extended regressors 

4>k = (^*-1 ••• “k-i ••• £*-1 ••• £k-n ) , k = l,2,...,N (6.46) 

by means of which the A-, B- and C-polynomials are estimated. If this method 
is applied to the data of Example 6.5 for a twentieth-order primary model and 
a second model according to Eq. (6.44), the competitive result is obtained: 

a b c V a 2 ||0|| 

Two-step LS 0.9011 0.0914 0.6495 516 1.032 0.0086 

The method is known as a pseudolinear regression or two-step linear regression 
approach. ■ 
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6.4 KALMAN FILTER 

Now let us consider a general estimation and filtering problem. Consider the 
state-space model 

x k+ i = <t>x k + ru k + v k 
y k = Cx k + Du k + e k 
with !E{u*} = 0, £{e*} = 0, and 

£{ut; 7 ’} = i?i 
< E{ee T ) - R 2 
P(0) = i E{x< ) xl} = R 0 

Assume the noisy input-output data {y*} and {u*} 
able, and that the state Xk is not available for measurement. The problem 
of optimal estimation of Xk based on input-output data and knowledge of the 
model (6.47) can be solved by minimizing 

J(x k ) = <E { (x k , 1( * - i) 2 ], for k = 1,2,3,... (6.49) 

subject to (6.47). The Kalman filter or Kalman-Bucy filter for prediction of x 
based on the present data at time k is 

+ rw* + Kk{yk ~ Cxk\k-i) 

K k - 4>P k C T (R 2 + CP k C T y l (6.50) 

P* + i = OP*<I> r + Ri- <D P k C T (R 2 + CP A C r )- 1 CP*O r 

which is a recursive equation where the estimates are updated as soon as new 
input-output data are available. In particular, the Kalman filter minimizes 
Eq. (6.49) in cases where the noise components o* and e* are independent 
and normally distributed. The solution obtained is quite general and the 
method has a vast scope of application. A case of specific relevance for system 
identification is illustrated in the following example. 

Example 6.7—Kalman filter for identification 

The following formulation of the identification problem is useful for estimation 
of time-varying parameters: 


(6.47) 


(6.48) 

to be the only data avail- 


Qk+i = 6k + v k , 

yk = <Pl e k + e k , 


‘E{v k \ = 0 

£{e*} = 0 


(6.51) 
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with the standard covariance matrices i?i = , E{vkvl}, and R 2 = £{e*ej}. 
The Kalman filter for this problem is 

= Ok + K k {y k - <!>l6k) 

Kk = Pk<Pk{R2 + <plPk<Pk)~ l (6.52) 

Pk+i = Pk + R± -Pktk(R2 + <t>kPk^kY 1 <l>kPk 

which is an excellent method for identification of time-varying systems. Notice 
that the variables Z?i and i ?2 are important design parameters that should 
match the temporal variations of 6k and the observation noise, respectively. 


6.5 INSTRUMENTAL VARIABLE METHOD 

Consider the linear regression model 

¥ = + v (6.53) 

with the disturbance vector v e R N , the regressor matrix O e R Nx p , and the 
observations y e R N . Correlation between the regressors and the prediction 
error leads to bias of the parameter estimates 6 obtained from least-squares 
solutions to the linear regression problem. Methods that replace the regressor 
O used in linear regression by some other variable Z are called instrumental 
variable methods , and the estimate takes the form 


6 Z = (Z T <S>)~ 1 Z T y ( 6 . 54 ) 

There are two conditions to impose on the instrumental variables Z in order to 
make the estimator 6 Z consistent. First, the instrumental variables Z should 
be uncorrelated with the disturbances so that *E{Z T v} = 0. Second, the 
matrix Z T & must be invertible. Provided these conditions are satisfied, the 
following covariance estimate can be justified (a proof is called for in Exercise 
6.8 below): 

Cov(6 z ) = f E{(6 z - 6)(6 Z - 6) r ) = (Z T 0)- 1 Z T Z v Z(<P T Z)- 1 (6.55) 

Hence, in addition to the two previously imposed conditions it is necessary 
that Z T O be “large” so that 6 Z provides an efficient estimate. In other words, 
the instrumental variables should be chosen so that they are simultaneously 
uncorrelated with v and highly correlated with O. As we do not know v, it 
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Figure 6.8 Input and output from a first-order system with correlated disturbance 
inputs. 


is impossible to check how closely Z satisfies the requirement of uncorrelated 
behavior. 

Example 6.8—Identification with instrumental variable method 

Consider a case with data (N - 1000) (see Fig. 6.8) collected from the system 


S: yk = 0.9yk -1 + 0.1zz*_i + Wk + 0.1wk-i\ *£{Wk } =0, 

The parameter vector relevant for the transfer function Y(z)/U(z) is 


(6.56) 


(a) f 0.9 

'-LJ-Ui 

The biased least-squares estimate is 

(!) = (* T *)-'* T r = (“;“!) 


(6.57) 


(6.58) 
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and is based on the regressor matrix 


<t> = 

' y\ ■ 

u 1 ' 

, and (X = 

' y2 y 


- yN-i 

“N-l * 


* yN < 


Now introducing the variables we have 


(6.59) 


z k = az k _i + bu k -i 


(6.60) 


which is the predictor of yk obtained from the least-squares estimate a, b. 
Note that owing to this special choice of instrumental variable the method is 
similar to the output error method. With these variables we can suggest the 
instrumental variable 


( Zi Ui 1 


Z = 


V Zj^-l Ujv -1 ) 


(6.61) 


The instrumental variable estimate gives 


& = {Z T <S>y'Z T ? = (6-62) 

which exhibits a reduced bias in both parameter estimates as compared to the 
least-squares estimate. ■ 


Example 6.9—Difficulties in choosing the instrumental variables 
There is of course no guarantee that all choices of instrumental variables will 
provide good identification properties. Consider, for instance, the following 
instrumental variable applied to the data of the previous example. 


Z = 


0 

in 


Ui 

u 2 


U N ~2 u _Y - 1 / 


=> ** = (Z T <t>) 'Z T y = | ° ^ j (6.63) 


Clearly this is a poor estimate. It is, of course, also necessary to avoid degen¬ 
erate cases such as 



u i ) 


(6.64) 


V U A r—1 


UN -1 ) 
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This is obviously not a feasible choice of instrumental variables as there is 
linear dependence between the two columns of Z so that Z T <t> is not invertible. 


A conclusion to be drawn from Example 6.9 is that it might be difficult to 
choose appropriate instrumental variables. As a result, it is standard pro¬ 
cedure to use the instrumental variable method in an iterative manner that 
incorporates some kind of filtering. Moreover, certain established methods 
of other origin can be interpreted as instrumental variable methods — e.g ., 
the Yule-Walker equations for estimating the parameters of an autoregressive 
(AR) process. 

Example 6.10—The Yule-Walker equations 
Consider the AR process in the system 

S: A{z~ l )y k = w k ; £{«;*}= 0, T.{w 2 k ) = a 2 (6.65) 

with degA(z) = j%a . The Yule-Walker equations elicit the relationships exist¬ 
ing between the AR parameters and the autocovariance function. 


n A 


C yy {r) = £{y*y*-r} = t E[(-'^ / a i y k . i + w k )yl . x } 

1=1 

n A 


( 6 . 66 ) 


so that 


Cyy(t) = 


~ £?=i aiCyyir -i) + al, x = 0 


(6.67) 


Y!ii\°-iCyy(r-i), T 0 

Identification of the coefficients a t * can be approached by choosing numbers 
M > tia and p > tia and by introducing 


or - ( 

< 

yk~ 2 

and the corresponding matrices 


z k 


yk-i •* • yk-n A J 

jy f y k —2 ■ ■ • yk-p -2 ]; k = p,...,M + p 


( 6 . 68 ) 


jltjl x 


'<!>l ' 


f OrT 

Z P 

; 

, and Z = 

: 



Z T 

1 Z M 


o = 


(6.69) 



Sec. 6.6 Some aspects of application 


125 


so that 


( Cyy(i - 1 ) 

Cyy(i) 


= (Z T <t>)0 > 


Cyy(i~ 2) 

Cyy(i ~ 1) 


{ Cyy(i) 1 
Cyy{l + 1 ) 


Cyy{l TlA) 
Ksyy(i IlA 1 ) 





#2 

/ 

k a n A > 


(6.70) 


= z Tt y 


l Cyy(i+p)) 

which provides an estimate of the AR parameters as 

6 = (Z T <S>)~ 1 Z T 9 r 


(6.71) 

■ 


6.6 SOME ASPECTS OF APPLICATION 

An important problem is the sensitivity of the least-squares solutions to out¬ 
liers and other abnormal data. A remedy is to inspect data and to exclude the 
outliers before further signal processing. A popular method in this context 
is application of a median filter applied to a sliding time window along the 
data series. The median filter thus elicits the median value from the set of 
data in the time interval under consideration. Unfortunately, however, the 
median filter may introduce artifacts in the form of time-delay variations in 
the input-output relationship, or even apparent noncausal behavior. The lat¬ 
ter problem appears when a median filter is applied to rapidly varying data 
— e.g., during the initial phase of a step response. 

Prefiltering, smoothing, and prewhitening by data filtering are often effective 
in reducingbias provided that the filters are appropriately chosen with respect 
to the purpose of identification. A common problem is the compensation of 
known periodic variations as may occur due to seasonal effects in time-series 
data— e.g., in economics, meteorology, hydrology, or ecology. Annual time- 
variations in monthly data can be compensated for by means of the following 
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filtering 

Y'(z) = (1 -z- 12 )Y(z) 

U f (z) = (l-z~ 12 )U(z) (6.72) 


and application of the modified model 

M ': Aiz-^yi = B{z~ l )u f k + v k (6.73) 

More generally, variation in sampled data with a periodic time-variant behav¬ 
ior and a period of d samples can sometimes be compensated by introducing 
the filter 

F(z -1 ) = 1 — z~ d , d=period of trend (6.74) 

Obviously this filter also removes constant offsets from data. 


Bias reduction 

As can be seen from Example 5.5 and Example 5.7 several methods are based 
on some kind of variance minimization that involves a trade-off between bias 
and variance minimization. Bias in the parameter estimates naturally con¬ 
stitutes a serious problem, and there are several ad hoc methods for bias re¬ 
duction such as trend elimination, differentiation of data, and bias estimation 
implemented by estimation of an extra parameter. 

Trend elimination of order n is made by subtracting a polynomial of order n 
from data. The polynomial of order n is most commonly found in such cases 
as a least-squares estimate adapted to data. Standard use of this method 
is usually limited to subtraction of the mean value of data (0 th order trend 
elimination) or elimination of linear trends as obtained by linear regression. 

Example 6.11—Trend elimination 

Consider Example 5.5 where the noise offset *E{e*) =1 embodies a serious 
bias problem. Computation of the sample means 


1 

y=wT,y* 

k=i 
i N 

“ = 


(6.75) 


*=1 


and their subtraction from the sequences {y*} and { Uk) results in the modified 
model 

9vC : A(z~ 1 )(y k - y) = B(z~ l )(u k - u) + v k (6.76) 
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Parameter estimation of a first-order model based on the same data as in 
Example 5.5 gives the result 



which exhibits a clear improvement over Example 5.5 with respect to the bias 
magnitude. a 

Example 6.12 —Differentiation of data 
Consider the model of Example 5.5 


yk = ayk-i + bu k _ 1 + v k ; = v 0 / 0 (6.78) 

Straightforward application of discrete-time difference operator A = 1 — z~ l 
gives 

M-' \ Ayk = aAyk_i + bAuk-i + Au*, where A = 1 -z~ l (6.79) 

The noise components now have the expected mean £{At/*} =‘E{v k - = 
0, which eliminates the offset in the noise. However, application of least- 
squares identification gives the disappointing result 

2 fa) f -0.0894 ) 

* ' l 5 J ' 1 0 0852 J (6 - 8 °) 

which should be compared to the correct values a = 0.9 and b = 0.1. The 
differentiation solves one problem but, unfortunately, introduces new noise 
correlations and, in turn, causes new problems of bias in the parameter esti¬ 
mates. Maximum-likelihood identification of the model 


: 

gives the result 


Ayk — o.A yk-i + bAuk^i + ii'k + cwk-i 


(6.81) 


f a ' 


' 0.8582 ' 


' -0.0418 ' 

b 

= 

0.0522 

=> e = 

-0.0478 

, C j 


. -0.9243 , 


. -0.0757 , 


(6.82) 


which clearly provides improved accuracy as compared to the ordinary least- 
squares identification in Example 5.5 and, of course, better results than the es¬ 
timate (6.80) obtained from straightforward differentiation and least-squares 
estimation. m 
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Table 6.1 A comparison of various methods used in Example 6.13. Notice that 
there is no obvious relationship between a 2 and ||d|| 2 - 



a 

b 

Offset 

V 

a 2 

11*112 

Ordinary LS 

0.9829 

0.1550 


531 

1.0642 

0.0995 

Differentiation+LS 

-0.0894 

0.0852 

— 

522 

1.0467 

0.9895 

Differentiation+ML 

0.8582 

0.0522 

— 

515 

1.0304 

0.0635 

Trend elimination 

0.9066 

0.1144 

— 

552 

1.1073 

0.0157 

Offset estimation 

0.9066 

0.1143 

0.9111 

499 

1.0015 

0.0157 


Example 6.13—Offset estimation via an extra parameter 
Assume that the noise component {u*} can be decomposed as 

Vk = Vo + Wk (6.83) 

where { Wk} is a white-noise sequence and where v 0 = tE{y*} represents the 
constant non-zero offset in the noise. Introduction of an extra parameter 
representing the offset yields the model 


fW': 


yk = ~ay k -1 + bu k _i + v 0 + w k = 


{ 


-yk -1 


Uk-l 


with the extended parameter vector &' = ^ a h u 0 
estimate of 6' = ^ 0.9 0.1 1.0 j is 


>) 


' a ' 
b 

. V 0 , 


+ Wk 


(6.84) 


The least-squares 


r a 


' 0.9066 ' 


r 0.0066 ' 

b 

= 

0.1143 

1! 

IQS 

IT 

0.0143 

< Vo > 


, 0.9111 , 


. -0.0889 , 


(6.85) 


where the basic idea is to “absorb” the bias due to the constant offset compo¬ 
nent v 0 present in the noise sequence {o*}. 

As clearly demonstrated by the figures in Table 6.1, a comparison of the differ¬ 
ent methods suggests the offset estimation and trend elimination methods to 
be superior. It is obvious that differentiation as used in this example cannot 
be recommended in the context of least-squares identification, owing to the 
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unavoidable noise correlation involved. Trend elimination and offset estima¬ 
tion both tend to give better results. ■ 


6.7 SOME REMARKS ON CONVERGENCE AND CONSISTENCY 


Asymptotic convergence theory is concerned with the behavior of random vari¬ 
ables and parameter estimates as the sample size tends towards infinity. For 
basic statistical definitions of convergence and consistency, the reader is re¬ 
ferred to Appendix B which also contains important theorems such as the 
Cramer-Rao lower bound, the central limit theorem, and calculation rules for 
the probability limit. The application of these methods to convergence analy¬ 
sis can be illustrated, for instance, by means of the least-squares estimate 

6 N = = 6 + W.86) 


Assume that the disturbance sequence e fulfills the white-noise assumptions 
*E{£e T } = a 2 I . Then, if the regressors and disturbances are uncorrelated, 
using the probability limit we can show consistency in probability of least- 
squares identification according to the calculation 

plim On = 6 + (plim ♦ plim (—rO^e) = 0 + 0 = 0 (6.87) 

with the covariance function 


Co v{VN(0 n -0)) = plim { N ) n) 


-ii 


= cr 2 (plim — 


-1 


( 6 . 88 ) 

A problem related to consistency is the question of the limiting distributions 
of the parameter estimates. A valuable theorem in this context is the central 
limit theorem, see Appendix B, which can be applied to a consistent parameter 
estimate, if it can be regarded as the sum of independent stochastic variables. 
Direct application of the central limit theorem, to a consistent least-squares 
estimator, gives 


'/N(6 n - 0) d -^'!AC(0,Ze), where = (7 2 £{i<D^<l> w ) 1 (6.89) 


Hence, the limiting distribution for \/N(0n - 0) is normal. In addition, it 
turns out that the estimated parameters converge at a rate proportional to 
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1/>//V, and that the corresponding covariance converges at a rate proportional 
to 1/N. 

Another valuable result for the analysis of limiting distributions is the follow¬ 
ing: Supposing f(6) : R n — » R m to be a continuously differentiable function 
and that df/dOi ^ 0, for all i = 1,2,..., n, it can be shown that 

- m) ^(0 >{%) T Pe%) (6.90) 

If Os is found by means of the numerical method (6.43) when applied to an 
optimization criterion such as the log-likelihood function or some prediction 
error criterion based on the sample covariance, then we obtain the following 
Taylor series expansion at the minimum 

0 = V Vs{0s) = Vy A r(0) + y 2 Vs(O)(0N “ 0) + higher-order terms (6.91) 

Consistency in probability for 6s follows if plim {VVw(0;y)} = 0, provided 
that plim {V 2 Vw(#)j exists and is invertible (i.e., a unique parametrization 
is required). If the higher-order terms in the Taylor series expansion are 
neglected, we may also suggest 

plim (VN(6 n - 6)) « -(phm(V 2 V„(0)))-Viml^VVyv^))) (6.92) 

Provided that plim{ V 2 V r ^(0)} exists and is invertible, i.e., if V has a unique 
and global minimum at 6 so that a unique parametrization is required, then 
it may be concluded 

i/N(6 n = *C(O,(V 2 V n (0))- 1 P(V 2 Vk(6)) ] ) 

where 

P = lim W£{(VV A K£))(VtM<?))'''} (6.93) 

A 7 -*oo 

which can also be motivated by reference to Eq. (6.90). Moreover, by reference 
to Eq. (6.43), we may suggest the estimate 


N N 

V 2 V n ( 0 n ) = X>*(4)^[(0jv) + 53^(^.v)V 2 ti(^) 

*=i *=i 

~ 2 N N 

Pb{0n) = N)£k(0x)^k(^N) 

j= i k=\ 


(6.94) 
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With suitable approximations based on assumptions on uncorrelated residuals 
and by neglecting terms proportional to V 2 e, this can be simplified to 


1 _ 
v *=i 

1 N 
v *=i 


(6.95) 


as an estimate of the parameter covariance matrix. Similar results can be 
shown for almost sure consistency in the case of identification by means of AR- 
MAX models with normally distributed disturbances with a unique parametri- 
zation. However, as all the results obtained with limiting distributions are 
valid only asymptotically, undue emphasis should not be placed on precise 
levels of significance for finite data series. 


6.8 CONCLUDING REMARKS 

It is obvious that differentiation as used in Example 6.12 cannot be recom¬ 
mended in the context of least-squares identification, owing to the unavoid¬ 
able noise correlation involved. Trend elimination and offset estimation both 
tend to give better results. Notice that this is obvious from Example 6.13, 
despite the contrary evidence suggested by the least-squares estimation loss 
function. In this context it should be borne in mind that the parameter error 
and the least-squares loss function are not always proportional in any 
simple manner. In fact, it is generally difficult to evaluate the quantitative 
behavior of convergence in mean square for standard linear autoregressive 
moving average models. 
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6.10 EXERCISES 

6.1 Consider the system 

S : yk + ay k - 1 = bu k - 1 + w k + cw k .i (6.96) 

where {and { w k ] are independent zero-mean white-noise processes 
with the variances £{uf} = a\ and £{u;f} = cr 2 , respectively. What 
are the asymptotic parameter estimates for a large number N of obser¬ 
vations when fitting a model in the model set 

M : yk + ayk-i = bu k .i (6.97) 

to data. 

6.2. Show that the following numerical algorithms converge toward the min¬ 
imum at x = —Q^qi for certain choices of step-length or,- when applied 
to a quadratic function 

v (x) = ^ x t Q 2 x + x T q 1 + q 0 (6.98) 

" here x e R n and the matrix Q% = > 0 and q\ e R n and the scalar 

q o e R. 

i. The Newton-Raphson algorithm 

* ( ‘ +1) = x il) - *i(V V i, )r 1 V r/ (jc (i) ) (6.99) 

ii. The Levenberg-Marquardt algorithm 

* (,+1) = * (l) - or,-(or,/„ xn + V’"(x (i) ))- 1 V'(*W) (6.100) 

iii. The Gauss-Newton algorithm 

x (i * l) = x {i) - 0 CiRV f {x {i) ) (6.101) 

where R is a constant positive definite weighting matrix. 

Determine conditions on R and a,- for the algorithm to converge. 
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Hint: Define the error at iteration i as = x^ l) - x, and then consider 
the error norm 

|| e (0||2 = (eWfp^)) (6.102) 

for some positive definite matrix P and take 

||e (t+1) ||p < ||e (,) ||p (6-103) 

as a criterion of convergence. 

6.3 Consider a moving average (MA) process y* = &iw*-i + • • • + b m Uk-m + v k> 
and show that the process parameters bi,...,b m can be consistently esti¬ 
mated using least-squares identification even in the presence of colored 
noise {u*}. 

6.4 Modify the maximum-likelihood identification method for statistically 
independent perturbations distributed according to the Rayleigh distri¬ 
bution with the asymmetric probability distribution function 

f(x) = -4e' lW . * > 0 (6.104) 

<J Z 


6.5 Consider the system 

S : yk + i = a(u k + w k ), w k e fA7(0,cr 2 ). ^{wiWj} = 0" 2 <% 

(6.105) 

where {if*} is a zero-mean white noise sequence and where {u*}, {y*} 
are measured variables. How can a be estimated optimally? 

6.6 Formulate the instrumentable variable equations on the augmented sys¬ 
tem form (5.104). Show that 



(6.106) 


6.7 Show that the instrumental variable estimate can be written 

(e) (I-<t> N (Z T ® N )- 1 Z T <PA,(Z r O*)-M 

(e) [ (Z T 0 N )- 1 Z T -(z^Ow)- 1 J l 0 J 


(6.107) 
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6.8 Consider the instrumental variable method and show by means of Eo 
(6.107) that 



0 


)(U>-«(„;>( 


Zu 

OpxJV 


0 Nxp 

6pxp 


(6.108) 


Use this result to justify the covariance estimate 


6.9 


Cov{ 



/ <P N 
Z T 0 
PZ v P r 
RZ. V P T 


rr Zt 

Onxp ) 

f 7 


) l %xN 

Opxp J 

[z T 

0 J 


PI. V R T \ 

RZ V R T J 


for P = (I -4> N {Z T <b N y x Z T ) and R = ( Z T <t> N )~ l Z T . 
Consider the functions 


-T 


(6.109) 


fi(A) = logaetA 
/ 2 (A) = tr(WA) 


( 6 . 110 ) 


where W is a symmetric positive definite weighting matrix. Show that 
fi and f 2 are suitable as scalar optimization criteria when applied to 
sample covariance matrices. Show that these functions are such that 

fi(A) > f i (Ao) and /2(A) > f 2 {Ao) for square symmetric matrices A and 
A 0 satisfying the inequality A > A 0 . 

6.10 Consider the system 


yk = -ay k _ a + bu k _ 1 + v k (6.111) 

where { 0 ^} is a sequence of independent identically distributed stochas¬ 
tic variables, each with the probability density function f(x) = 

Design a maximum-likelihood method that permits estimation of a and 
b. _ 




Modeling 


7.1 INTRODUCTION 

Although in a wider sense of the term, modeling is universally applicable 
not only in technology and such sciences as physics, chemistry, and biology, 
but also in fields such as economics and social sciences, this chapter will be 
confined to annotated examples drawn from the natural sciences. Moreover, 
several of the important modeling approaches that we have already will not 
be further commented on here. A suitable starting point is to consider some 
standard modeling procedures of importance in the context of identification. 
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7.2 MECHANICAL SYSTEMS 

The essential mechanics involved in the motion of a particle is contained in 
Newtons second law of motion, which defines force and mass. For a single 
particle this law is expressed thus: 

dp d . .. 

F = Si ‘ Tt (mg) ' mq < 7 I > 

where p is the linear momentum of the particle, q its position, q its velocity, 
q its acceleration, m its mass, and F the total force acting on the particle. 
(We use the standard notation used in mechanics where q and q designate 
the time derivatives dq/dt and d 2 q/dt 2 , respectively.) An essential aspect is 
the conservation of momentum and energy (kinetic energy + potential energy) 
when no external forces are acting on the system. In generalizing these ideas 
to systems of particles and mechanical structures, it is necessary to distin¬ 
guish between external forces acting on the particles from outside the system 
and internal forces exerted, say, on some particle i by all other particles in the 
system. Moreover, the presence of constraints such as fixed distances between 
some particles limits the motion. 

Example 7.1—Newtonian mechanics 

In Newtonian mechanics modeling starts with a force equation. Consider, for 
instance, an Atwood machine with two weights m\ and m 2 and a frictionless 
pulley (Fig. 7.1). Let the vertical positions of the weights be denoted x\ and 
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x 2 and let the tension of the rope be designated X. 

miXi=mig-X ^ 

m 2 x 2 = rn 2 g - X 

The rope connecting the two weights constitutes a motion constraint such that 

xi + x 2 - i = 0 (7.3) 


This constraint places a restriction on the velocities and accelerations of m\ 
and m 2 such that x\ = — x 2 and x\ = -x 2 . Elimination of xi and x 2 from Eq. 
(7.2) gives the tension of the rope 


X = 


2mim2 
m i + in 2 


(7.4) 


To surmount the problem that forces of constraint are unknown a priori , it 
is desirable to formulate the mechanics so that the forces of constraint are 
zero (d’Alembert’s principle). This leads, via variational calculus based on 
modeling of energy, to Lagrangian mechanics. The following Euler-Lagrange 
equations may be obtained by considering the kinetic energy T , the potential 
energy C7, and the Lagrangian L = T - U. 


± ( dL,_dL 
dt dq dq 


(7.5) 


where r denotes external forces acting on the system and q the generalized 
coordinates. ■ 


Example 7.2—Lagrangian mechanics 

Consider the kinetic energy T and the potential energy of the masses of the 
Atwood machine (Fig. 7.1) 


U = -migx - m 2 g{l - x) 
T = + m 2 )x 2 


(7.6) 


where x denotes the position of the center of mass. The Euler-Lagrange equa¬ 
tions based on L = T — U give 


dL 
dx 
d 3L 


= (mi 

d 


dt dx dt 


(( 


- m 2 )g 
in i + m 2 )x) 


{mi + m 2 )x 


(7.7) 
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Figure 7.2 A robot with two arm segments of length i\ and to , two point masses 
mi and m-2, and angular coordinates q\ and <72- 


so that 


m 1 — m<i 

- g 

m 1 + ;n2 


(7.8) 


The motion constraint, i.e ., the tension of the rope, does not appear explicitly 
in these equations, which simplifies calculation. ■ 


Example 7.3—Lagrangian mechanics for modeling of a robot 

Consider the robot in Fig. 7.2. The positions (*i,yi) and (x 2 ,y 2 ) °f the seg¬ 
ment end points expressed in the Caitesian coordinates .r and 3’ are 

.Tj = —li sinqi 
y 1 = h cosqi 
x 2 = -/1 sin qi - / 2 sinq 2 
y 2 = /1 cosqi + Z 2 cosq 2 


(7.9) 
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where x is the horizontal coordinate and y the vertical coordinate with their 
origin at the first joint. The corresponding velocities vi,v 2 are 


v i = 

v 2 = 



( —Zicosgi -<h 1 
l —li sin<7i qi j 
( -licosqi-qi-hcosqi-qi 
\ —li sin qi-qi- h sin q 2 • 92 


(7.10) 


Introduce the following shorter notation ci = cos qi, s 2 = sin <72 etc., and 
<7 = [ 9 i 92 j • The potential energy is then 

U(q) = miglici + m 2 g(lici + l 2 c 2 ) (7-11) 


The kinetic energy is 



which can be expressed in the angular coordinates as 


(7.12) 


T(q,q) = \q T M{q)q = 

z * (713) 

1 ( r -r'l f (mi + m 2 )l\ m 2 lil 2 {cic 2 + SiS 2 ) 1 (91 1 v ' 

2 ' 1 J 1 m 2 lil 2 {cic 2 + S\S 2 ) rn 2 l\ J 192 J 

where the matrix M(q) is the inertia matrix. Application of the Euler- La¬ 
grange equations gives 


M(q)q + C(q,q)q + G(q) = r (7-14) 


where M(q) is the inertia matrix, C(q,q)q the centripetal and Coriolis forces 


C(q, 9)9 = m 2 hl 2 {cis 2 - sic 2 ) 


(-qi + 9192 ' 

l 9? - 9i92 . 


(7.15) 


and G(q) the gravitation forces involved 


G{q) 


dU _ ( -(mi + m 2 )l\S\ 

dq £ [ -m 2 l 2 s 2 


(7.16) 
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Lagrangian mechanics can be modified to include constraint forces, i.e., con¬ 
straints that can be expressed in the form 

dc 

c (q) = 0, => (—) T q ~ 0 (holonomic constraints) (7.17) 


This type of constraint depends on the position q only; such constraints are 
called holonomic constraints. The force generated by the constraint is 


F = 


dc 

dq 


(7.18) 


at the constrained surface c(q) = 0 and the work done by these forces is 
dW = F qdt = 0, i.e. , the constraint forces neither generate nor dissipate 
any energy. Holonomic constraints thus generate position-dependent workless 
contact forces. 


7.3 THERMODYNAMIC MODELING 


Thermodynamic modeling is widely used in chemistry, chemical engineer¬ 
ing, and mechanical engineering to model heat transfer, fluid mechanics, and 
chemical reactions. According to thermodynamic theory, there exist some fun¬ 
damental thermodynamic laws: 


If two systems are in thermal equilibrium with a third system, they must 
be in thermal equilibrium with each other. 

An equlibrium state of a system can be characterized by a quantity E 
called internal energy , which has the property that for an isolated system 
E = constant. 


2 . 


An equilibrium state of a system can be characterized by a quantity S 
called entropy which, in any process in which a thermally isolated system 
goes from one state to another, characteristically tends to increase, i.e., 
dS > 0. If the system is not isolated and undergoes a process in which 
it absorbs heat dQ , then 



(7.19) 


where T is the absolute temperature of the system. 
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P 


in 

Figure 7.3 A thermodynamic process where Q is the heat absorbed by the system, 

W the work done by the system, and Ei n and E oul the internal energy of the inflow 
and outflow, respectively. 

3. The entropy S of the system has the property that S —» So when T -» 0+ 
where So is a constant. 

Let Q denote the heat absorbed by the system and W the work done by the 
system. The system interacts with the environment by absorbing (or emitting) 
heat and by performing mechanical work according to the relationship 

dQ = TdS = dE +dW (7.20) 

Hence, in the relationship (7.20) the total energy is split into a part W due to 
mechanical interaction and a part Q due to thermal interaction. If for a given 
system the values of the state variables are independent of time, the s vs tern 
is said to be in thermodynamic equilibrium. 

The number of state variables required to describe the process may be larger 
than the number required to describe the system at thermodynamic equilib¬ 
rium, and there are several interdependent internal states (pressure p, vol¬ 
ume V, entropy S, temperature T) whose evaluation is determined by the 
conversion of various forms of energy. Transformation from one subset of 
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variables to some other subset can be accomplished by using Legendre trans¬ 
formations. 

Internal energy E E(S,V), dE = TdS -pdV 

Enthalpy H H(S,p) = E+pV, dH = TdS + Vdp 

Helmholtz free energy F F(T, V) = E - TS, dF = -SdT -pdV 

Gibbs free energy G G{T,p) = E - TS + pV, dG = -SdT + Vdp 

. . (7.21) 

For instance, if there is no thermal interaction with the environment, the 
work that can be done by the system in Fig. 7.3 is limited by the enthalpy 
relationship 

W = (E in + p in v in ) - (E out +J)outVout) (7.22) 

where enthalpy is the energy concept in which entropy and pressure are con¬ 
sidered to be the independent variables. 


7.4 COMPARTMENT MODELS 

Material flow and storage in biological systems is often modeled with a com- 
partmental system, which can be defined as a system made up of a finite num¬ 
ber of macroscopic subsystems, called compartments or pools, each of which 
is homogenous and well mixed, that interact by exchanging materials. There 
may be inputs irom the environment into one or more of the compartments, 
and there may be outputs (excretion) from one or more compartments into the 
environment. The compartments are depicted as boxes, and the exchange of 
materials from one compartment to another by an arrow indicating direction 
of flow. Exchange rates between compartments are often indicated adjacent to 
the arrows. As compartments need not be physically separated, compartment 
models are also useful for modeling chemical reactions between components 
A, B, and C such as 

2 H 2 0 ^ 2 H 2 + O z (7.23) 

where the exchange rates represents chemical reaction rates. It is an excellent 
means of modeling the dynamics of such agents as radioactive tracers mixed 
into some normal substance. The tracers should be chosen such that the sys¬ 
tem is unable to distinguish between the normal material and the tracer and 
such that it does not affect the steady state and the exchange rates of the car¬ 
rier substance. The methodology is an established approach in many natural 
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Figure 7.4 Distribution of a drug when the drug is injected at an intramuscular 
site. 



Figure 7.5 Transfer of a contagious disease in a contaminated environment. 

sciences such as chemistry, biochemistry, physiology and medicine, pharma¬ 
cology, and the study of ecosystems in ecology. It is also used in demography 
to describe population growth and migration and the spread of epidemics. 

Example 7.4—State-space model and compartment model 
Consider the example in Fig. 7.4 of a model of the distribution of a drug 
injected at an intramuscular site. A state-space model with states i - 
1,2,3 and exchange rates or transfer coefficients r* can be suggested as 


d_ 

dt 


r *1 ' 


' -ri 0 O' 


v N 

Xi 

*2 

= 

ri -r 2 - r 4 r 3 


X2 

, *3 , 


k 0 r 2 —r 3 , 


< X3 j 


y = 


- (o r 4 o) 




*2 

t *3 ) 


(7.24) 


The state-space model (7.24) would, of course, with another interpretation of 
the exchange rates r;, also express the compartment model of Fig. 7.5 which 
has the same number of compartments and similar interactions as that of Fig. 
7.4. « 

The dynamics of the system is typically studied as the transient responses 
from initial states (see Fig. 7.6). By adding a tracer to one compartment, and 
recording the responses from each compartment available to measurement, 
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Figure 7.6 A three-compartment model described by Eq. (7.24) (ri - •••'= ,-4 = 1) 
with transient responses after loading of each one of the three compartments. The 
lower right diagram shows the response to a constant loading of compartment x\. 


it is possible to study the transfer between and among the compartments, 
and thus determine the exchange parameters of the system. The associated 
impulse response g(t), then, can be interpreted as the elimination rate at time 
t of the tracer material that entered the system at time t - 0. Assume the 
total amount of tracer to be m, the distribution volume for the tracer in the 
compartment systems to be V , the constant outflow rate i outy and the outflow 
concentration of the tracer to be c(t). The case of a compartment system with 
one outflow and no consumption or production of material can be expressed 
thus: 


f 

roc 

Jo 


g{t)dt = 1, g(t ) > 0, \/t 

tg{t)dt = V/i out , Stewart-Hamilton equation 


(7.25) 


The inequality g(t) > 0 in (7.25) is motivated by the assumption that no tracer 
material enters the system via the outflow. The first equation is motivated 
by the fact that all the injected material will eventually leave the system via 
the outflow, whereas the second equation expresses the mean duration of the 
injected tracer’s presence in the compartment system. The tracer elimination 
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rate is then 


i 0 utc(t) = mg(t), so that 


lout 


0 c(0* 

] g{t) = ioutc(t)/m = 


c(t) 

r* c(t)dt 


(7.26) 


where Eq. (7.25) has been used. Aside from the impulse response g(t) it 
is thus possible to calculate the outflow rate i out provided that the output 
concentration can be measured over a suflicently long time interval. Such a 
calculated flow rate is called clearance in biomedicine. 


7.5 PRINCIPLES OF MODELING 

Although modeling is in many respects a craft which is strongly dependent on 
the purpose of the model, there exist some general principles that can be used 
as guidelines for sound modeling. As modeling techniques would seem to have 
reached their highest level of maturity in mechanics and thermodynamics, the 
following guidelines derive from these branches of science. 

Physical balance equation 

The conservation of mass, energy, and momentum are fundamental principles 
of physical modeling and apply to systems modeled by mechanical, electrical, 
chemical, and thermodynamical laws. The balance equation of some subsys¬ 
tem takes the form 


accumulation flow = inflow - outflow (7.27) 


which can be suitably modified to model production ( sources ) and consumption 
(sinks). In some instances there might be no justification for assuming the 
existence of a storage element (cf KirchhofFs law of electrical circuits), in 
which case the static equilibrium takes the form 

0 - inflow - outflow (7.28) 


Flow variable 


ras add to zero at some connection 


which does not con¬ 


tain any storage element. The presence of storage elements of mass, energy, 
momentum or some other variable leads to state equations. Storage elements 
of mass, energy, etc., are sometimes called containers, buffers, or compart¬ 
ments, and are ruled by equations of the form 


— (state variable) = inflow - outflow 
at 


(7.29) 
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When there are several subsystems exchanging mass, momentum, or energy 
among the subsystems, the balance and state equations give rise to a set of 
interacting equations of the type in Eq. (7.29). 

Nature often acts so as to minimize energy, and many modeling problems 
describe a condition of equilibrium associated with a mi nimum of energy. The 
system strives towards a state of equilibrium, and a state of equilibrium is 
successful only if it is stable. Hence, a system would require input of energy 
to move away from equilibrium, and the systems often oscillate around the 
equilibrium when energy is conserved. Movement will grow if the system 
releases rather than consumes energy, such as occurs in unstable systems. 

Potentials, gradients, and flows 

The fluxes of a system are determined by certain potentials such as electric 
voltage, temperature, pressure, or concentration, and their gradients. Closely 
related to the notion of potential are energy concepts as used in physics. In 
contrast to flow variables, which add to zero at a connection point, the gradient 
variables are equal at connection points. 

Nonequilibrium conditions are characterized by a flow in directions relative 
to the gradient. The flows involved disappear in equilibrium conditions, but 
may also reach a steady-state of time-invariant flows and gradients. In physics 
and chemistry, there are several phenomenological laws describing the rela¬ 
tionship between flow and gradient variables such as: 

- Fick’s law: The mass flow of diffusion is proportional to the mass con¬ 
centration gradient in a physical medium. 

- Fourier’s law: Heat flow is proportional to the temperature gradient in 
a medium. 

J = ~ kA % (7 ' 30) 
where J is tne heat-transfer rate, and dT/dx is the temperature gradient 
with respect to a spatial coordinate x. The positive constant k is the 
thermal conductivity, and A the cross-section area of the thermal flow. 

Ohm’s law: Electric current is proportional to the electrical voltage. 

Anotherexample is Newton’s law of the relationship between shearing force 
and velocity gradient. Similarly, the rate of chemical reaction is proportional 
to the concentration of each component of the reaction. 

The coefficients between the gradients of the potentials and the correspond¬ 
ing flow variables during steady-state conditions are called conductivity (heat 
conductivity, electrical conductivity, admittance etc.), the reciprocal coefficient 
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being resistance ( e.g ., heat flow resistance, aerodynamical resistance, electri¬ 
cal resistance or impedance). 

Entropy conditions 

Most processes with energy storage also satisfy other conditions as not all 
possible behavior associated with a given state satisfies certain entropy con¬ 
ditions, which restrict the behavior of physical processes to be irreversible. 
Entropy describes the dispersal of energy, the natural tendency of sponta¬ 
neous change toward states of higher entropy; cf. Eq. (7.19) and the second 
law of thermodynamics. 

Entropy calculations are often used in statistical mechanics and thermody¬ 
namics, and have also been applied to communication theory in order to model 
information flow (Shannon) and coding. 

'Onsager reciprocal relations 

An important law describing the relationship between flows, gradients, and 
entropy in physics is the Onsager reciprocal relations. Onsager derived his 
principle from statistical mechanics considerations under the assumption of 
microscopic reversibility, that is, the symmetry of all mechanical equations of 
motion of individual particles with respect to time where a proper choice is 
made of fluxes J (heat flow, electrical current, chemical reaction rate, momen¬ 
tum, etc.) and generalized forces F (temperature gradient, electric potential 
gradient, chemical affinity, mechanical force, etc.) such that the entropy pro¬ 
duction per unit time may be written as 

^ = J t F = F t CF > 0 (7.31) 

dt 

for some matrix C. At thermodynamic equilibrium, all processes stop and we 
have simultaneously 

J = 0, and F = 0 (7.32) 

If the fluxes J and forces F are related by linear phenomenological relation¬ 
ships 

n 

J = CF, or J i = Y,C ij F J , i = l,...,n (7.33) 

j =i 

then according to Onsager reciprocal relations the coefficient matrix C is sym¬ 
metric. 
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Example 7.5—Thermodiffusion 

Consider the case of one-dimensional thermodiffusion along a spatial coordi¬ 
nate x, and assume the temperature to be T(x). For simple heat conduction 
we have for the heat flow and the temperature gradient 

fJ i = k 

1 F, - - i O 7 - 34 ) 

K r l - t dx 

Now include the diffusion and consider the modified gradient-flow relationship 

= CuF i + C 12 F 2 w k ere f Fi temperature gradient 

l ^2 = C 21 F 1 + C 22 F 2 * \ F 2 mass concentration gradient 

(7.35) 

where Cu is the heat conductivity (Fourier’s law), and C 22 the diffusion co¬ 
efficient (Ficks law). The coefficients C 21 , C 12 describe interference of the 
two irreversible processes of heat conduction and diffusion, respectively, C 21 
representing the appearance of a concentration gradient when a temperature 
gradient is imposed (Soret effect) and C 12 the converse situation (Dufour ef¬ 
fect). According to the Onsager relation we have in this case C 12 = C 21 . ■ 


7.6 PHYSICAL PARAMETRIZATIONS 

Modeling methods based on physical balance equations are often effective in 
describing system behavior, although there are always subsystems or cases 
that require some identification or measurement in order to complete the mod¬ 
eling. In such cases it is advantageous to stick to the context of a physical 
model when performing parameter estimation. Identification of such physi¬ 
cal parameters poses a number of special problems as demonstrated in the 
following example. 

Example 7.6—A model of a DC-motor drive 

Consider the motor drive of Fig. 7.7 with the input torque u as the control 
variable and the angular velocity y = q x as the variable available to measure¬ 
ment. iue motor drive with moment ol inertia J\ is coupled to a load via a 
torsion spring with spring constant k. The load is assumed to have a moment 
of inertia J 2 and a damping c/ 2 . There is also a random load moment v acting 
on the system. The angular positions of the shafts are designated q\ and 92 - 


J1Q1 = k(q 2 - <71) + u 
J ( iq 2 = ~k(q 2 ~ qi) - d 2 q 2 + v 


(7.36) 
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Figure 7.7 A DC motor and a load Jt with a flexible coupling modeled by a torsion 
spring with a spring constant k. 


For modeling purposes we introduce the state vector 


v > 
Xi 


r <7i 

x 2 

= 

<72 

<x 3 , 


. Q2 ~ Qi , 


which evolves according to the system equations 


(7.37) 


dx(t) 

dt 


y(t) = 


r 0 

0 

k/Ji' 


' Wi ' 


' 0 ' 

0 —<^2/^2 

-k/j 2 

x{t) + 

0 

u(t) - 

1 M 

l-i 

1 

0 , 


. 0 , 


, 0 , 


( 1 0 0 ) x(t) 

(7.38) 


with the transfer function 

V( S ) 


5 : G u (s ) = 






rS + « 


U(s) J v J 2 s 3 + Jid 2 s 2 + k{J l + J 2 )s + kd 2 


(7.39) 


The physical parameters in the simulation model were J\ - 0.1, J 2 = 0.1, 
k = 10, d -2 = 1, which leads to the discrete-time transfer function 


H u (z) 


B(z) 

A(z) 


0.848z 2 - 0.627z + 0.313 
z 3 - 0.986z 2 + 0.888z - 0.368 


5 : 


(7.40) 
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Time [s] 

Figure 7.8 Data from a DC-motor drive with input u and output y as described in 
Example 7.6 

in cases where the input u is constant over each sampling interval h. Assume 
the noise model to be expressed thus: 


H _ C(z) _ z 3 - 1.8z 2 + 0.97z 
v{Z) A(z) ~ z 3 - 0.986z 2 + 0.888z - 0.368 


(7.41) 


with a noise variance £{i> 2 } = a 2 = 1 and £{u 2 } = 1, so that the signal-to- 
noise ratio of inputs is equal to one. Artificial data obtained by simulation 
of the system in Eq. (7.40) and Eq. (7.41) when sampled with h = 0.1 are 
shown in Fig. 7.8 and the spectrum analysis of data is shown in Fig. 7.9. 

A noteworthy feature is that the continuous-time representations (7.38) and 
(7.39) both have an advantage over the discrete-time model in that the pa¬ 
rameters have a clear physical interpretation. We call such a representation 
physical parametrization. Nevertheless, the parameters are not present in 
the transfer function in a completely original form but as products, ratios, 
sums, or other aggregate parameters. 

The identifiability of each individual physical parameter therefore depends 
upon two conditions: 
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Figure 7.9 Coherence spectrum between input and output of the DC-motor drive 
of Example 7.6 

i. The identifiability of each coefficient of the transfer function 

ii. Determination of the physical parameters from the transfer function co¬ 
efficients 

Notice that determination of physical parameters via identification of the 
discrete-time coefficients requires one additional transformation, namely that 
from a discrete-time model to a continuous-time model. This is a nontrivial 
problem. 


7.7 NETWORK MODELS 

As mentioned earlier we distinguish between the potentials (force, voltage, 
pressure) and flows (displacement, momentum, current, flow rate). The man¬ 
ner in which individual processes interconnect one with another and act upon 
one another gives an organizational pattern to the overall process. Graphical 
descriptions are invaluable for network modeling, and it is natural to intro¬ 
duce nodes (or vertices ), which define the connections of the components or 
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Circuit Currents Voltages 



Figure 7.10 A circuit diagram and the corresponding currents and voltages. 


subsystems, and edges (or branches ), which always lie between two distinct 
nodes. 

The nodes are associated with some gradient variable, whereas the edge vari¬ 
ables are associated with flow variables. In network and graph models it is 
customary to use the synonyms of node variable for potential and edge vari¬ 
able for flow. An advantage is that such model descriptions allow analysis by 
methods of graph theory or linear systems. A determining factor is the differ¬ 
ence between potentials that connect the edge variables to the node variables. 

Example 7.7—An electrical network 

Consider the circuit diagram in Fig. 7.10. The KirchhofFcurrent law provides 
the algebraic constraint 


0 = A T i = 


ri o o 


o 1 

lo o 


o 

i 


l 

o 


o 

l 


-l -l 


l 

-l 

0 ) 


'll 1 

h 

*3 

U 

15 

v ie ) 


(7.42) 


The matrix A is called the connectivity matrix (or incidence matrix) because 
it is determined from the connections of the network. KirchhofFs voltage law 
evaluated along the branches of the network provides the equations 


i 0 

0 1 

0 0 

1 0 

0 1 

. 1 -1 


0 

0 

1 

-1 
-1 
0 , 


r Vi ' 

v 2 


r hRi 
12 R 2 
13R3 

* 4-^4 

I5.R5 + £5 7^ 

. is Re + u , 


Au = 


(7.43) 
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Let the first term be denoted i? t , the second term G u u, and the third term 
L{di/dt). These equations can then be organized as follows 


(L 0 W di/dt 1 _ ( -R A 1 ( i \ ( G u 1 

l 0 0 J l dv/dt J ~ l A T 0 J l v J l 0 J “ 


(7.45) 


which takes on the familiar form Ex = Fx + Gu with x as the state variables 
and u as the external input variable. The network models are thus suitable 
both for graph theory analysis and for linear system analysis such as state- 
space models and transfer function models. 


♦Analysis of Ex = Fx + Gu 

Application of the singular value decomposition UY.V to the matrix E gives 

UTVx = Fx + Gu (7.46) 

Introducing the transformed variable z = Vx gives 

Zi = U~ 1 FV~ 1 z + U~ l Gu = F'z + G'u (7.47) 


where 



(7.48) 


The matrix Zi is a full rank diagonal matrix containing the nonzero singular 
values of E. The rank deficit of E is reflected in the diagonal components of 

X, which suggest the state-space decomposition z - ( zf z£ ] and 


F' = U~ l FV~ 1 



(7.49) 
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t* 



Figure 7.11 A mechanical network with springs on the edges (the spring constants 
are denoted ki) and masses (my) at the nodes. 

so that 

Ho :)(•:) -{% ":)C:) + («:)“ 

Such a system is sometimes called a differential-algebraic system because of 
the decomposition into an algebraic equation for zq and a differential equation 
for z\. There is thus a static relationship which determines zq as a function 
of Z\ and the input variable u so that 

FoiZl + FqqZq + GoU = 0 => Zq = — F^(Fq X Zi + GqU) (7.51) 

Substitution ofzo of Eq. (7.51) in Eq. (7.50) gives the unconstrained dynam¬ 
ical system 


21 = - F\qFqq Fq\)z\ + - F X qF^Gq)u (7.52) 

which can be further analyzed or simulated with standard methods. The 
original state-space variable may be recovered as 
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forcej^^ 

K = A t SA 
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Figure 7.12 Relationships between external forces and displacements in mechan¬ 
ical structures with structural stiffness K. 


Structure of the network models 

The algebraic structure of the network models offers good opportunities for 
analysis as exemplified in the following: 


Example 7.8—The connectivity matrix and the stiffness matrices 
Consider the network in Fig. 7.11 with springs on the edges and masses at 
the nodes. Forces F,- acting at the network nodes will result in deformation 
of the structure and the elongation ej in the jr'th spring in the three spatial 
components (x, y, z) is determined by the positions x of the masses m,- 
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(7.54) 


where the matrix A is called a connectivity matrix (or incident matrix) because 
it is determined from the edges of the network. The action of the external 
forces Fi results in internal forces Pj in the jth spring, and is proportional to 
the elongations e,-, i.e., 


P = Se = 


kllsx3 0 

0 k 2 1 3x3 

0 


0 

0 ^6^3x3 


(7.55) 
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where the diagonal matrix <S is composed of stiffness coefficients according to 
Hooke’s law. The work done by the external forces is absorbed by the internal 
work of stretching the springs so that x T F = e T P = x T A T P if there is energy 
conservation in the network. The external forces F in the springs thus balance 

the internal forces P for 

f/ 3 x3 o 0 / 3x3 o J 3x3 ' 

F = A T P= 0 Z 3x3 0 0 / 3x3 —/ 3x3 P (7.56) 

. 0 0 1 3 x 3 — / 3 x3 —73x3 0 , 

The potential energy is determined by the expression 

V = i x t A t SAx - x t F (7.57) 

where the first term is the strain energy in the set of springs, whereas the 
second term is the potential energy of the external forces F. The kinetic 
energy of the network is 


-x t Mx = -x 1 
2 2 


mil3x3 0 

0 TTI2I3 


° 1 


(7.58) 


V 0 * • * 0 >716/3x3 ) 

The total energy is then H — T + V and dynamic equilibrium in case of energy 
conservation is obtained for 0 = dH/dt = ( dV/dx) T x + ( dT/dx) T x where the 
gradients are 

^ = A t SAx-F 

d r ( 7 - 59 ) 

The Euler-Lagrange equations then yield the motion equations 


Mx = = -A t SAx + F (7.6C) 

The static equilibrium occurs where dV/dx = 0, which is obtained for F = 
Kx where K = A T SA is the structural stiffness matrix ; see Fig. 7 . 12 . The 
corresponding transfer function relationship between F(s) and X(s) is 


X(s) = (Ms 2 + K)~ 1 F(s) 


(7.61) 
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Identification and analysis of the natural frequencies and other properties 
of the transfer function of Eq. (7.61) are known as modal analysis in the 
domain of mechanical engineering. The eigenvectors and eigenvalues of the 
generalized eigenvalue problem 

det (MX + K) = 0 (7.62) 

determine the resonance modes ±\[X of the network and provide important 
information about how to detect and avoid vibration and resonance. Modal 
analysis also makes use of spectrum analysis; see Chapter 4. 

Constraints 



Consider the electrical network in Fig. 7.13. The voltages of capacitors Ci 
and C 2 and the current is constitute a dynamical system with certain static 
constraints. 


Kirchhoff’s current law 
KirchhofF’s voltage law 
Kirchhoff’s voltage law 


ii = h + h 
v 2 = Ris + v L 
u = v 1 + l >2 


The balance equations determine a state-space equation with certain static 
constraints 


n. 

^ 1 

-c 2 

0 ' 


r . 1 - ^ 
^ i 


KJ 

0 

1 > 
X 


i 


f O' 

0 

0 

L 

CL 

dt 

V2 

= 

0 

1 

-R 


V2 

+ 

0 

. 0 

0 

0 , 



.-1 

-1 

0 . 


< is * 


. u 4 


(7.63) 


This system equation is of the form 


Ex = Fx + Gu 


( 7 . 64 ) 
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with the matrix E as a singular matrix whereas F is a full-rank matrix. The 
system (7.63) is an example of a differential-algebraic system with system 
dynamics determined by a set of differential equations and constrained by 
algebraic equations. The transfer function from u to 13 is 


g(s) “T7(5) = (° 0 *) i'B-f)-' 

sCi 

~ s 2 L(Ci + C 2 ) + s/?(C x + C 2 ) + 1 


r 0 ' 

0 

. 1 , 


(7.65) 


which is a second-order linear system, instead of a third-order system, which 
might be expected in view of the multiplicity of components. In such systems, 
the question therefore arises as to what variables should be chosen as the 
independent variables and what variables as the dependent variables. ■ 


Passivity 


Kt) 

O- 1 

v(t) „ 

O- 

Figure 7.14 A network with the input v(t) and the output i{t). 


Network 


Consider the network in Fig. 7.14. with input v(t) and output i(t). A passive 
network is characterized by the following relationship between the input v(t) 
and the output i(t). 

E = f i(s)v(s)ds > 0, (7.66) 

J — OO 

The calculated variable (7.66) represents an interaction energy between input 
and output (see Section 3.7), and this quantity has the physical interpretation 
of energy in cases where the input is a gradient variable (voltage) and the 
output a flow variable (current). Passivity is a property of resistors because 

Er = / i(s)v(s)ds = f Ri 2 (s)ds > 0 

J -oo J -oo 


(7.67) 
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Passivity is also a property of inductors and capacitors as 

E l = [‘ i(s)v(s)ds = [‘ i(s)L^4^-ds = i Li 2 (t) > 0 
7-OC 7-00 ds 2 

Ec = f i(s)v(s)ds = [ C^^-v(s)ds = \cv 2 {t) > 0 
7-oo J-oo ds 2 

The passivity conditions are one important way of characterizing a system 
or subsystem as devoid of energy sources while allowing storage elements. 
Passivity has proved to be an important factor in stability analysis. 


7.8 HISTORICAL AND BIBLIOGRAPHICAL REMARKS 

Speculation as to the nature of mathematical models has a long history. For 
instance, the ancient Pythagorean natural philosophy believed that the “real” 
was the mathematical harmony present in nature. According to this philoso¬ 
phy, mathematical relationships that fit natural phenomena constitute valid 
explanations of why things are as they are. The alternative view is that math¬ 
ematical models are computational devices that should be distinguished from 
theories about physical structure. 

Another time-honored issue is model complexity. The fourteenth-century En¬ 
glish philosopher William of Occam used simplicity as a criterion of concept 
formation and modeling. In his view it is desirable to eliminate superfluous 
concepts so that the simpler of two theories that account for a particular phe¬ 
nomenon is to be preferred. This methodological principle favoring low model 
complexity is often referred to as “Occam’s razor.” 

Modeling and identification as a methodology dates back to Galileo (1564- 
1642), who is also important as the founder of dynamics. Modeling of phe¬ 
nomena and comparison with experimental data also stimulated the devel¬ 
opment of statistics, as a means of accounting for inexact measurement and 
uncertainty in data. Much successful modeling has thus been done using 
deterministic modeling of an object and statistical modeling of environmen¬ 
tal perturbations. However, the practice of extensive statistical modeling in 
system identification has occasionally been challenged on the grounds that 
it might be obstructive to the analysis of trade-offs between approximation 
and model complexity. It should be borne in mind that a substantial part of 
statistical terminology can be viewed as a set of modeling assumptions with 
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valuable abstractions such as the notions of uncertainty, probability distribu¬ 
tion functions, discrete-time white noise, uncorrelated independent variables 
with spectra, and autocorrelation functions. White noise in continuous-time 
modeling is more difficult to treat, and is thus less popular because it pre¬ 
sumes an infinite power that is clearly unrealistic and in contradiction to the 
precepts of standard physical modeling. Another exciting area of research in 
control theory and statistics is that of discrete event models , which are used 
to model sequence control and manufacturing systems. Another branch of 
statistical modeling is game theory — e.g., differential games that are used to 
model hostile environments. 

The theory of stochastic processes based on the ideas of white noise and 
Brownian motion is to be found, for instance, in 

- R.S. Liptser and A.N. Shxkyaev, Statistics of Random Processes. Part 
I: General Theory. Part II: Applications.. New York: Springer-Verlag, 
1977. 

Modeling with the methods of classical mechanics dates back to Newton in 
the seventeenth and early eighteenth centuries, and is probably the oldest 
modeling methodology in use. The energy methods of Lagrange, Hamilton, 
and others date back to the eighteenth century. A good theoretical source is 
the work: 

~ R- Abraham and J.E. Marsden, Foundations of Mechanics 2 d ed., Read¬ 
ing, MA: Addison-Wesley, 1978. 

Several modeling techniques including networks, variational methods, and 
bond graphs are presented in 

- P.E. Wellstead, Introduction to Physical System Modelling. London: 
Academic Press, 1979. 

- F.E. Cellier, Continuous System Modeling . Berlin and New York: Sprin¬ 
ger-Verlag, 1990. 

Thermodynamic theory and modeling are described in 

- F. Reif, Fundamentals of Statistical and Thermal Physics. New York: 
McGraw-Hill, 1965. 

- R. Haase, Thermodynamics of Irreversible Processes. Reading, MA* Addi- 
son-Wesley, 1969; New York: Dover, 1990. 

Variational methods, partial differential equations, approximation theory, and 
numerical methods are presented in a more academic style in the following 
work: 
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- R. Dautray and J.-L. Lions, Analyse Mathematique et Calcul Numerique. 
Paris: Masson, 1984. 

All practical modeling involves some kind of simulation which provides ex¬ 
perience in experimental dynamics. Simulation of such physical processes as 
airflow over an aircraft wing poses challenges to modeling with many difficul¬ 
ties even in the case with good computation power. A present focus of interest 
is the chaotic behavior of dynamical systems, which has been discovered in 
simulation of certain nonlinear differential equations. Special topics in mod¬ 
eling such as chaos and catastrophe theory are to be found in the following 
books: 

- J. Guckenheimer and P. Holmes, Nonlinear Oscillations, Dynamical Sys¬ 
tems, and Bifurcations of Vector Fields. New York: Springer-Verlag, 

1983. 

- R. Thom, Stability Structurelle et Morphogenese. Paris: InterEdition, 
1977. 

Compartmental models began to be used in the 1940s with the use of tracer 
experiments in natural sciences, and such models are currently in routine use 
in pharmacokinetics and ecological sciences. 

- C. Cobelli and K. Thomaseth, “Optimal input design for identification 
of compartmental models—Theory and application to a model of glucose 
kinetics”. Math . Biosciences , Vol. 77, 1985, pp. 267-286. 

- JA. Jacquez, Compartmental Analysis in Biology and Medicine , New 
York: Elsevier, 1972. 

The technology of modal testing as used in mechanical engineering is covered 
in 

- D.J. Ewins, Modal Testing: Theory and Practice , New York: John Wiley, 

1984. 


7.9 EXERCISES 

7.1 Determine the elimination coefficient r in the compartmental model 


x{t) = - rx(t) 
y(t) = cx(t) 


(7.68) 
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7.2 The serum concentration of an antibiotic drug exhibits the following 
course after oral intake: 


Dose 



Time (in 

hours) 





0.5 

1 

1.5 2 

3 4 

6 

8 

12 

200mg 

0.45 

0.65 

0.55 0.45 

0.30 0.25 

0.20 

0.15 

0.10 

400mg 

0.85 

1.25 

1.25 1.00 

0.70 0.65 

0.45 

0.30 

0.20 

600mg 

1.45 

1.75 

1.85 1.45 

1.05 0.85 

0.55 

0.40 

0.25 


The serum concentration after a 30-minute infusion, as measured from 
the end of infusion is as follows: 


Dose 



Time (in minutes and hours) 




5’ 

10’ 

15’ 30’ 

45’ 

12 4 

8 

12 

25mg 

0.75 

0.60 

0.55 0.40 

0.35 

0.30 0.20 0.15 

0.05 

0.05 

50mg 

1.55 

1.20 

1.05 0.85 

0.65 

0.60 0.45 0.30 

0.15 

0.05 


The maximal serum concentration after an oral dose is thus with good 
approximation proportional to the dose given (see Fig. 7.15). Formulate 
a compartmental model that reproduces the above data. 


7.3 Consider the following relationship (sometimes called a logistic curve) 


y(u) = x(u) + e = 


_1_ 

1 + exp(-(a + pu)) 


(7.69) 


between the measured variables u and y. The disturbance £ with the 
statistic mean 'E {£} = 0 is not available to measure. Show that the 
transformation 

log-^- (7.70) 

1 — X 

is helpful in order to formulate the estimation of a and P from experi¬ 
mental data y, u as a linear regression problem. 
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Figure 7.15 Upper diagram shows the serum concentration after oral intake of the 
drug (200mg, 400mg, and 600mg). Lower graphs show the serum concentration after 
an i.v. infusion (25mg and 50mg). All graphs versus time. 


7.4 A model of a water tank with a cross sectional area A and an outlet area 

a is given by 

A l[i = Qin -a V / 2 ~gh 

where qi n is the inflow, and h the water level. 

a. Assume that there is no inflow, i.e., q in = 0. Show that h{t) is then 
given by 

h{t) = A 0 (l - |r) 2 , 0 < t < T 
where h 0 is the water level at Lime t = 0, and T = 

a V g * 

b. Devise a model of the form y(t) = <p(t)0 for the estimation of 6 = T. 

c. Assume that h 0 = 10. Make a least-squares estimation of T based 
on the following data: 

t 12 3 4 

h(t) 8.9 7.4 6.3 5.5 
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Figure 7.16 The two-link robot in Example 7.5. 


The rigid body dynamics for a two link robot (see Fig. 7.16) are given by 

Ti = m 2 /|(7i + <?2) + w 2 ZiZ 2 c 2 (2<7i + £ 2 ) + (mi + m2)l\qi - m 2 /iZ 2 s 2 <7 2 
- 2m 2 Zi/ 2 s 2 <7 1 g 2 + m 2 / 2 gci 2 + (mi + m 2 )/igci 
*2 — m2l\l2C2qi + m2lil2S2q\ + nz 2 Z 2 gCi 2 + m 2 Zf (gi + g 2 ) 

(7.71) 

where mi and Z, are the masses and the lengths of the links, and the 
notation c, = cos(< 7 ,), s, = sin(< 7 /), ci 2 = cos(< 7 i + < 72 ), etc. is used. 

a. Assume that q it q t , q h and r, are measurable, and that l x and Z 2 are 
known. Devise a model for identification of the parameters m x and 
m 2 . 

b. Derise an identification model for the case where only q t , q it and t, 
are measurable. 

c. Is it possible to identify the parameters l\, Z 2 , m x , and /n 2 ? Assume 
that the measurable variables are as in a. 

7.6 The dissolved oxygen dynamics in an open aerator of an activated sludge 
system are given by 

y(t) = a(t)u(t) (c(t) - y(t)) - R(t) (7.72) 

where y is the dissolved oxygen concentration, R the respiration rate, au 
the oxygen transfer rate, and c is a time-varying coefficient. A typical 
goal is to estimate a and R on-line during control of y via the input u. To 
simplify the problem, consider only discrete time estimation of R. 
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a. Suppose that the sampling period h is chosen such that a, u, c, and R 
can be regarded as constant between the sampling instants. Devise 
a model of the form 

y(kh) = a(kh)u(kh) ( c(kh) - y{kh)) - R(kh ) (7.73) 

where y(kh) can be computed exactly from y{kh), y(kh + h), a(kh), 
and u{kh). 

b. Compare the formula for computation of y(kh ) with a forward Euler 
approximation, i.e., where y{kh) « (y(kh + h) - y{kh))fh. Discuss 
the choice of sampling period for the two cases. 

7.7 



Figure 7.17 The ball and beam process in Exercise 7.7. 

For the ball and beam process shown in Fig. 7.17, 

a. Devise a dynamical model for the system. The mass of the ball is m 
and the moment of inertia Ji = amr 2 where r is the radius of the 
ball. The mass of the beam is M and the moment of inertia J 2 . The 
applied torque is denoted r. 

b. Formulate a model for the identification of m. Assume that x, x, (p, 
and r are measurable, and that M, J 2 , and d are known. 

7.8 In contrast to other network models it appears that compartment mod¬ 
els are described only in terms of the transfer of material between the 
compartments. In the terminology of network analysis such transfer cor¬ 
responds to a set of flow variables J = x = Ax with a matrix A containing 
the transfer coefficients. Show that one can associate to the compartment 
model a potential V(x) = x T Px with P = P T > 0 and a set of gradient 
variables F = —dV/dx and determine conditions for which F T J >0. ■ 



8 




The Experimental 
Procedure 


8.1 INTRODUCTION 

This chapter discusses principles and methods to guide the experimental pro¬ 
cedure. Problems of the experimental condition with respect to the choice of 
input and problems concerning identification of systems in closed-loop opera¬ 
tion are treated. The end of this chapter gives some practical advice as to the 
planning of experiments. 
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8.2 THE EXPERIMENTAL CONDITION 

There are several conditions that have to be imposed to obtain good experimen¬ 
tal results and data that fulfill the prerequisites of the identification methods 
used. The choice of input is obviously of great importance for the outcome of 
the identification. Another important aspect is the presence of regulators in 
closed-loop operation and the complexity of such regulators. A major part of 
identification theory considers time-invariant methods for open-loop systems. 
There are much fewer methods for analysis of closed-loop systems and very 
few methods apply to identification of adaptive systems. A third aspect is the 
presence of subsystems in discrete-time operation, a fact that may ,result in 
interference with discrete-time measurement and data collection. All method 
prerequisites and circumstances of the identification procedures that have to 
be fulfilled at the time of the experiment are called the experimental condi¬ 
tion. An objective of this chapter is to give some hints and guidelines as to 
the appropriate design of experiments. 


8.3 IDENTIFICATION AND CLOSED-LOOP CONTROL 

There are several interactions between identification and control of impor¬ 
tance for the result of identification. For example, a variable time delay be¬ 
tween measurement and control will result in variable phase delay which, in 
turn, may result in a variable model order of the identified model. A necessary 
requirement is therefore that there is a well-defined synchronization between 
discrete-time measurement and control. 

Example 8.1—A system operating with feedback control 
Consider a system according to Fig. 8.1 with the following signal relations 

{ x k = Hp(z)uk 

< ( 8 - 1 ) 
{yk = *k + Vk 

with input u, disturbance v, and output y. Assume that the regulator set point 
is zero. The control variable is thus determined from 


u k = -H R (z)y k 


(8.2) 
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Figure 8.1 A system with feedback control. 


and the resulting closed-loop system is 


(1 + H P {z)H R (z))y k = v k (8.3) 


A simple but uninteresting relation between u and y can be obtained from the 
regulator equation (8.2) 


yk = - 


Hr(z) 


Uk 


(8.4) 


A conclusion is that there are potential problems of non-uniqueness for closed- 
loop identification with spectral estimation methods and other methods that 
do not respect causality. B 


Example 8.2—Time-series analysis of a closed-loop system 
Consider a system described by the equation 

y k+ i = -ay k + bu k + v k+ i (8.5) 

Assume that the regulator is a simple proportional controller with 

u k = —Kyk => Uk + Kyk = 0 ( 8 . 6 ) 

It is, of course, possible to add to (8.5) the zero-valued term X{uk + Kyk ) = 0 
where A is some arbitrary Lagrange multiplier so that 


y* + i = -ay k + bu k + v k+ i + A(uk + Ky k ) 


(8.7) 






170 


Chap. 8 The experimental procedure 



Figure 8.2 Subspaces of ARX-parameters with identified parameters on the ordi¬ 
nate as functions of the original parameters a and b and some arbitrary multiplier 
A. 

By collecting terms we obtain 

yk+i = (-a + KX)y k + (b + X)u k + v k+l (8.8) 

The system equation (8.8) is thus no longer characterized by some unique 
system parameters a,b. Instead, any set of coefficients -a + KX and b + X, 
where X is arbitrary, is an adequate set of parameters (see Fig. 8.2). 

An important observation is that it is not sufficient to know the regulator 
parameters and that the parametric model does not admit any unique solution 
even if the control law is explicitly known. ■ 

Example 8.3—Identification, in a closed-loop system 

Assume that the system dynamics is described by the equation 

y k + ay k -i = bu k - 1 + e k (8.9) 

with the “white-noise” properties 


E{e k ] = 0; E{e ie J) = S t , 


( 8 . 10 ) 


and the regulator 

u k = -Ky k (8.11) 

The transfer function between input u and output y may be computed as 


H P {e icoh ) 


Syy(iCO) 

S U y(ia>) 


( 8 . 12 ) 
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where the autospectrum S yy and the cross spectrum S uy are 


S yy (ico) 

Sy y (i(t)') — 


1 + (a + bK)e iu ' 
1 


(8.13) 


1 + (a + bK)e~ it0 1 + (a + bK)e ia 
The estimated transfer function is 


Hp(e io)h ) = _ JL_ 

S uy (ico) K 


(8.14) 


The same conclusion as in Examples 8.1 and 8.2 applies to this example with 
problems of a calculated transfer function that does not estimate the control 
object. Instead, the estimated transfer function reflects the properties of the 
regulator and gives an estimate of the inverse of the regulator transfer func¬ 
tion. B 

The examples given in this section indeed point to several difficult problems 
in the context of closed-loop control. However, all examples given rely on the 
fact that the external input u c (see Fig. 8.3) was equal to zero, and it can 
be expected that better identifiability would result with a non-zero reference 
input. 


Example 8.4—Identification of a system using a reference signal 
Consider a system described by the equation 


y* + i = -ay k + bu k + v k+ i (8.15) 

Assume that the regulator is a simple proportional controller with 

u k = -K(y k + u e ) (8.16) 

It is in this case no problem to find a sequence of u c such that the matrix 



y o 

Uq > 


r yo ■ 


"0 Ku c0 > 

d> = 

l 

* 

= 


(l -K) + 

: : 


< yx-i 

UN -1 > 


yw-i > 


, 0 Ku c n_i * 


is a full-rank matrix (i.e., rank (d> r d>) = 2) and such that there is a unique 
solution 6 to the normal equations (O r O)# — <b Ti y = 0. E 
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Figure 8.3 Indirect (left) and direct identification (right). 


8.4 DIRECT OR INDIRECT IDENTIFICATION? 


Identification of a transfer function between the control input u and the ob¬ 
servation y of a system in closed-loop operation is called direct identification 
because the identification provides an estimate of the subsystem between con¬ 
trol input and observations of output (see Fig. 8.3). 


Identification may also proceed from external (reference) input u c to the ob¬ 
servations y (indirect identification) in cases where both the regulator and the 
closed-loop system are linear systems. If we assume that the feedback trans¬ 
fer function Gr (s) is linear, time-invariant, and known, then we can suggest 
the procedure 

1. Identify the closed-loop system from u c to y. 

2. Compute the open-loop transfer function. 


Gp(ico) = 


G(ico) 

1 — GR(ico)G(ico) 


(8.18) 


Notice that this procedure presupposes a time-invariant regulator Gr which 
presents some problems as to the application of discrete-time regulators. A 
condition for indirect identification is therefore that there is well-defined syn¬ 
chronization between measurement and control. ■ 


Example 8.5—Spectral analysis in closed-loop systems 
It is feasible to use indirect identification also in the context of spectral analy¬ 
sis. Consider the following example with a closed-loop operation according to 
Fig. 8.4. Assume that the noise w and the external input r are uncorrelated 
and that the control law is 

U(s) = G ff (s)R(s)-G r (s)Y(s) 


( 8 . 19 ) 
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Figure 8.4 Indirect spectral analysis with reference signal. 


where R(s) = L{r{t)) and where U - L{u) and Y = L{y] denote the control 
variable and the system output, respectively. The cross spectrum between the 
reference variable r and the observed variable y is 

S yr {ico) = Gp(ico)S ur (i(o) + G v (ioj)S wr (i(o) (8.20) 

If r and w are uncorrelated it holds that 

Syr(ico) ” Gp(ico)S ur (ia)) (8.21) 

The input-output transfer function may thus be estimated as 

G P (ico) = (8.22) 

S ur (i<y) 


8.5 CHOICE OF INPUT ■ 

The result of identification is contingent upon a careful choice of input to the 
system under investigation. For example, the final result of identification can 
be evaluated as the sum of residuals obtained by the least-squares method or 
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the maximum-likelihood method. The sum of squared residuals is 

= Z >*-***) 2 = < 8 - 23 ) 

k=i k-i k=i 

where the last equality follows from the Parseval theorem. If we assume that 
the input-output relation is Y (s) = G(s)U(s) + V(s) and Y (s) = G(s)U(s ) and 
if the noise V (s) is of zero magnitude, then 


E4(«) = ±£HG(iai)-0(ia t ))U„f (8.24) 

k=l 7l k=l 7L 1 


for frequency points { (Ok } obtained from the discrete Fourier transform. It is 
thus obvious that the accuracy of the transfer function estimate is dependent 
upon the spectrum of the input. Frequency domain methods for design of 
the input thus consist of choice of a suitable input spectrum. For instance, a 
suitable input spectrum to be used in system identification for the purpose of 
control system design is such that there is much input energy (i.e., \Uk \ 2 is 
large) for frequencies around the bandwidth frequency of the investigated 
system. (This statement requires, of course, further theoretical motivation, 
which follows in the sensitivity analysis below.) Such an approach is by ne¬ 
cessity an iterative method as the system bandwidth is not known a priori . 

The idea of weighting in the frequency domain may also be applied to filtering 
of regressors and observations by means of data filters with high transmission 
at the desired frequency range. Such methods may often be reformulated via 
the Parseval relation as weighted least-squares methods. 

Apart from frequency domain methods there is obviously a general interest to 
provide a measure on the input spectrum to determine if the input is of suffi¬ 
cient complexity. One such approach is the criterion of persistent excitation: 


Definition 8,1—Persistency of excitation (Astrom) 

A signal a fulfils the condition of persistent excitation of order n if the following 
limits exist 
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( Cuu(O) C u „(l) 

I C uu (— 1) C uu ( 0) 


C uu (n — 1) 1 
C uu (n- 2) 


( 8 . 26 ) 


l C 

uu (1 ri) C uu (2 — n) 


Cuu( 0 ) ) 


is positive definite. a 

Persistent excitation is sufficient to obtain consistent estimates for the least- 
squares method and maximum-likelihood identification. The condition (8.25) 
implies a condition on the autocorrelation C uu (t) and thus on the autospec¬ 
trum S uu (io)) for at least n different frequencies coi,...,co n . In fact, as it is 
difficult to formulate criteria for experimental verification of persistency of 
excitation, this definition can be regarded as a stipulative definition of some 
sufficient conditions for consistency. 

Methods for experimental verification of the number of identifiable parameters 
from a set of data organized as the regressor matrix 


<t>// = ^ <J>j, d> u } e ,R N *l n r+n 0 ) ( 8 . 27 ) 

include singular value decomposition (SVD) 

= Uzr, E = diag^cr n ... a pp 0 ... oj (8.28) 

where the number of non-zero singular values determines the number of pa¬ 
rameters for which the normal equations have a unique solution. The SVD 
may of course also be applied to the matrix which, thus, provides a 

characterization of the input similar to persistency of excitation. 


‘Optimal’ input signals 


A relevant question is if there is any ‘optimal’ input signal—in some mean¬ 
ingful sense—to choose for identification purposes. One approach to the for¬ 
mulation of suitable conditions of optimality is to start by considering the 
Cramer-Rao lower bound on the parameter covariance 


£{(0 - ff)(0 - 0) T ) > M' 1 

where the Fisher information matrix M is defined as 

M = ^r dlogPmgMlogpmflh r, 
‘ 86 1 89 ’ 1 


(8.29) 


(8.30) 


86 



176 


Chap. 8 The experimental procedure 

Let us consider the case 9' = *6 + v with £{«;} = 0. Assuming that v is 
normally distributed noise with covariance matrix £ = ‘E{vu T \ = cr 2 / this 
gives 


P er|0,£) = {2n)- NI \detZ)- l '' 2 ex V {-\i!y~^e)) = 
= pW\0,o) = (27c)- N '\deta 2 I)- 1/2 exp(-^(0 r ~^)) 


and 


aiogp(yie.g i ) 

ae 

d\oz P me,<T*) 

tS 1 


) “ (-6 + d 


^(9^-O0) r <i> 

j^cr-o^cr-otf) 7. 


(8.32) 


Calculation of the Fisher information matrix for normally distributed noise 
£ = a 2 / gives 

M = I “ . I (8.33) 


■ (*r 


(AT-l)* 

2a* 


An attractive approach is to choose the experiment input 11 so that the Cramer- 
Rao lower bound M~ l is minimized in some sense under the input constraint 


ifE-l - 1 (8 - 34) 

~ *=1 

In order to formulate this as a well-posed optimization problem it is natural 
to minimize some scalar function or norm of M~ l under the constraint (8.34) 
and to choose the experiment input 11 that minimizes this scalar optimization 
criterion. Consider, for instance, optimization criteria of the type 


J^ll) = tr (WM- 1 ) 
■J 2 {U) = -logdetM 


where W is some weighting matrix. As M is a function of both the inputs and 
outputs, it is obviously necessary to know the system—or to actually make 
the desired experiment—before it can be calculated. This unfortunate prob¬ 
lem formulation presupposes data that are not available and the intended 
optimization of J x or J 2 of Eq. (8.35) has to be replaced by iterative or ap¬ 
proximate methods. For instance, pseudorandom binary sequences (PRBS) 
are often close to optimizing the criteria of Eq. (8.35) under the constraint 
Eq. (8.34). 
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Figure 8,5 A circuit to generate pseudorandom binary sequences with low auto¬ 
correlation. 

Example 8.6—How to generate pseudorandom binary sequences 
One motivation is the need for special test signals with a given amplitude 
and with low autocorrelation and with a period that can be chosen arbitrarily 
long. 

Such test signals can be accomplished by using feedback shift registers ac¬ 
cording to Fig. 8.5. These circuits operate in discrete time and the registers 
take on values 0 or 1. The addition operation © in the circuit is modulo-two- 
addition according to which 


(0, 

if x = 0, 

y = 0 

J !> 

if x = 1, 

^ = o 

1 

if x = 0, 

y = i 

lo, 

if x = 1, 

y = i 


(8.36) 


The interconnections and the feedback in the circuit are conveniently de¬ 
scribed by polynomials 


p (z X ) = 1 ©2 1 ©^" 3 ©r 4 02- 13 (8.37) 

where the polynomial degree is equal to the number of states in the shift regis¬ 
ter (cf Fig. 8.5). Polynomials that give long sequences of low autocorrelation 
are generated by irreducible polynomials, Le ., polynomials which cannot be 
divided by any other polynomial of degree greater than 0. 

The interconnections of the circuit may also be described by an octal number 
or a binary number (cf Fig. 8.5). 
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Power of 2 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 

Coefficients 

Binary 01000000001 1011 

Octal 2 0 0 3 3 

The following numbers describe in octal numbers the interconnections for 
some suitable (i.e., irreducible) polynomials that generate PRBS. 
n Octal code 

10 2011 

11 4055 

12 10123 

13 20033 

14 42103 

15 100003 

16 210013 

17 400011 

18 1000201 

19 2000047 

20 4000011 

As the circuit used for generation of the PRBS operates without external in¬ 
puts and because there is a finite number of states it is obvious that the PRBS 
generator will provide a periodic output. The period of the output depends 
upon n, i.e., the number of shift register stages of the circuit. The period of 
the resulting autocorrelation function is 

Period = (2 n — l)h (8.38) 

which indicates how n should be chosen in order to provide a long enough 

sequence of low autocorrelation. ■ 

Example 8.7—Fitting of a weighting function 

As an example of design of an optimizing input we consider the case of a 
finite-time impulse response described by a transfer function 

H(z) = feiz" 1 + • • • + b„z~ n (8.39) 

wdth the unknown coefficients 

e= (hi ... 6„) T (8.40) 

Let 

T 

Ui = ( u„_i ... UN-i ] . i = 0,1,....n (8.41) 
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The linear regression problem J = <t>0 + V may then be formulated with 



' U n 

Uo 



0 = 

U n -1 .. 

Ui 

= 

(t/o .. 


< un 

UN~n > 



and 


'U?u 0 


u*u n - 


0 T 0 = 



: 



. UZUo 

... 

u:u nj 

Let the optimization problem be formulated 

as 


(8.42) 


(8.43) 


minimize J 2 { ( U) = -logdetM = ~logdetO r O + const 

subject to U F TJ i = 1 (8.44) 

1 = 0 

This problem has a solution—see proof below—which specifies that all eigen¬ 
values of the matrix should be equal in order to allow for an optimal 

solution to Eq. (8.44). A pseudorandom binary sequence with UfUj ^ 
(N — n)<j%Sij for some magnitude o u thus provides a good approximate so¬ 
lution to the problem of finding an optimal input. 

Proof: The magnitude constraint UfUi/N = 1 can be reformulated to 
the equivalent condition 

tr(0 r <2>) = N (8.45) 

where tr denotes the matrix trace (see Appendix A). Now denote the eigenval¬ 
ues of by A i,..., A n where all A z > 0. Then follows (except for a neglected 
constant term in J 2 ) that 


J2 = - logdet O t O = -logJJ Ai = -£log,l z 

t=i 1 = 1 

n 

N = tr(0 T 0) = 

i = 1 

The optimization criterion J 2 may thus be replaced by 

n n 

J{X 1,..A m ) = - Y log A-i + ViYAi - N) 


(8.46) 


(8.47) 
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Figure 8.6 Transfer function uncertainty AGp(s) of a control object in closed-loop 
control with the regulator Gr(s). 

where the constraint of Eq. (8.46) is adjoint by means of the Lagrange mul¬ 
tiplier //. Conditions for an extremum can be evaluated by means of the 
gradient 

0 = ^j- = --r + J u ’ =* -*•/ = —. i = (8.48) 

By substituting A t * = 1/p into J and evaluating the extremum with respect 
to //, one finds the optimal p = n/N and, thus, the optimal A* = N/n , i = 
1,2Finally, evaluation of dPj/dXidXj shows that the extremum is a 
minimum. B 


8.b PARAMETER UNCERTAINTY AND CONTROL 

The significance of parameter uncertainty in the case of regulator design is 
reflected in the concepts gain margin and phase margin which in terms of 
transfer function gain and phase describe the possible parameter variation 
or parameter uncertainty of a control object for which the closed-loop system 
maintains stability. Assume that the transfer function uncertainty may be 
described by 

G P (s) + A G P (s) = (7 + L(s))G P (s) (8.49) 

The transfer function uncertainty L(s) expressed in Eq. (8.49) and in Fig. 8.6 
is sometimes called an unstructured multiplicative uncertainty which attracts 
much interest in the domain of 77°°-robust control. Assume that the parameter 
uncertainty L(s) can be characterized in the frequency domain by some bound 
m{co) so that 

a[L(ico)] < m{co) (8.50) 

where a[L(ico)] denotes the maximum singular value for all co of the mul¬ 
tiplicative uncertainty L{ico). Furthermore, let S(s) denote the sensitivity 
function 


S(s) - = (I + G p (s)G r (s)) 1 


(8.51) 
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Figure 8.7 Sensitivity function S(s ) = E(s)/V(s) for the open-loop system (<dashed 
line) and for the same system controlled by a PID-regulator (solid line) for the system 
(7.38). Notice that the sensitivity of the PID-controlled system is low in the low- 
frequency range and is highest around the bandwidth frequency where the closed- 
loop system is even more sensitive than the open-loop system. 

which describes the transfer function between disturbance and output error. 
Let the complementary sensitivity function be defined as 

T{s) = G p (s)G r (s)(I + G p (s)G r (s))- 1 (8.52) 

Small errors in the presence of the reference signal r(t) and measurement 
noise v(t) are obtained if S(s) is made “small” as the error E(s) depends upon 
S(s)V(s) and S(s)R(s). In addition, stability of the closed-loop system is 
maintained in the presence of all possible uncertainties characterized by Eq. 
(8.50) if and only if the complementary sensitivity function satisfies 

s s® <8 - 53) 

for all co. An objective of control design is to minimize the sensitivity S(s) 
to parameter uncertainties and parameter variations. This may be achieved 
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by choosing the regulator transfer function Gr(s) according to the following 
rules: 

Rl: Make S(ico) small whenever R{ico) or V{ico) is large 
R2: Keep T(ico) small whenever m(co) is large 

The objective of identification is here to reduce the uncertainty bound m{co) as 
much as possible. A frequency domain signal weighting is valuable in order 
to obtain a favorable accuracy distribution in the frequency domain. 

Example 8-8—The sensitivity function of a motor drive 

Consider the motor drive in Example 7.6 with the following system equations 


dx(t) 

dt 

y(t) = 


r 0 

0 

k/Ji ' 


r 1/J1' 


r 0 ' 

0 — d^f d 2 

-k/j 2 

x(t) + 

0 

u(t) - 

1/Ji 

l-i 

1 

0 , 


. 0 , 


. 0 , 


(i o o) x(t) 


(7.38) 

The sensitivity function (i.e., the transfer function from disturbance V(s) to 
the error E(s) according to Eq. (8.51)) of the open-loop system is shown in Fig. 
8.7 (dotted line) and the sensitivity function of a closed-loop system controlled 
by a PID-regulator with state feedback and integral action 


U(s) = 


( 10 25 -25 j X(s) + -bTi(s) 
(To 25 -25 ) X(s)-10Qi(s) 


(8.54) 


is shown in solid line. It is clear from the Bode diagrams that the PID- 
regulator causes the sensitivity to decrease for lower frequencies whereas it 
might even increase in the frequency range around the bandwidth of the sys¬ 
tem. For control systems analysis and design it is thus natural to develop 
dynamic models with good accuracy in intermediate frequency ranges. ■ 

A closer investigation shows that it is a good identification strategy to make 
m(co) small in the frequency region where the sensitivity is large, i.e., where 
\Gp(ia))GR(ico)\ « 1 and ar gGp(io))GR(ico) « -/r. One approach is to use this 
information to make a weighted least-squares solution that minimizes the 
Bode diagram uncertainty at the frequency ranges of high sensitivity. The 
sensitivity information can also be used for design of an input with much 
energy around the bandwidth frequency. 
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As some information is required to formulate good and precise experiments it 
is natural and often necessary to make iterated experiments. A standard pro¬ 
cedure of identification is to start with some first stage experiments with sim¬ 


ple experiments followed by continued experiments (second stage experiments) 
with careful motivations as to the choice of instruments, method prerequisites, 
choice of experimental inputs, and experiment duration. The sufficiency of the 
experiments made and the termination of the experimental procedure are usu¬ 
ally determined by model validation criteria and/or by identification of major 
sources of error that may put limitations on the result of system identification. 
Details of these considerations are presented in the following sections. 


First stage experiments 


It is valuable to clarify and state the purpose of the model at the veiy first 
stage of identification. The requirements of model accuracy are different when 
the model is to be used, for instance, in control design, simulator design, 
fault detection, or simply for process knowledge. Also, the prior knowledge is 
valuable at early stages of identification as it gives clues to model complexity, 
error sources, time-variation of dynamics, and other important conditions. 

The practical considerations involved are operating conditions of instruments, 
actuators, possible system limitations, and information about operation in 
closed-loop control. 

Simple experiments such as logging of data during normal operation, step 
disturbances, step changes of available inputs, and impulse disturbances are 
natural initial experiments. Correlation analysis of signals from available 
inputs and outputs is often a good method to determine basic dependencies 
between the measured variables. 

Evaluation of the simple experiments should focus on a basic qualitative un¬ 
derstanding of the system under consideration. In particular, it is valuable to 
investigate the simple causal relations between available inputs and outputs. 
This can be done with coherence analysis to determine the presence of non- 
linearities as well as disturbance magnitudes and spectral properties such as 
periodic disturbances. Also, to what extent a linear model may be assumed 
and in what frequency ranges such a model would be valid can be based on 
coherence analysis. 

Dominating time constants in the output responses and low-frequency noise 
can be evaluated from the impulse and step response tests. Also, the prob¬ 
lems of nonstationary, input-output relationships deriving from time-varying 
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dynamics of the system under investigation or drift of instruments can often 
be detected at this stage of identification. In addition, system time delays 
and multivariable causal relations may often be determined from the simple 
impulse and step response experiments and from coherence analysis. 

It should be borne in mind that operational records and other process doc¬ 
umentation may reveal anomalities and undesirable events such as manual 
interventions or recalibration of measurement devices during experiments. 
The presence of such events may, if undetected, create great confusion in the 
interpretation of results. 

Planning of the second stage experiment session 

Before proceeding to the second phase of experiments it is valuable to make 
decisions concerning the sampling interval, signal filtering, and desirable sig¬ 
nal levels with respect to nonlinearities, noise levels, and physical limitations 
and saturations. Also, operational problems such as manual interventions 
and calibration problems should be considered. It is important to interact 
with personnel operating the systems under investigation, to explain the pur¬ 
pose and needs of identification, and to discuss the experience obtained. 

Continued experiments 

The second stage of experiments is characterized by a systematic investigation 
of the subsystems involved in the identification and with design and execution 
of suitable experiments. A natural starting point is the instrument properties 
which involve accuracy, dynamics, noise, trends, calibration, and analog-to- 
digital converters. A second point of concern is actuator properties such as 
accuracy, dynamics, and limitations. 

The first stage experiments may have suggested the use of certain identi¬ 
fication methods. The necessary method prerequisites and other conditions 
necessary for application of available methods should be considered at this 
stage. The following tests are applicable to all parametric modeling. 

Test of linearity can be approached by estimating models for different input 
amplitudes. Also, the symmetry of response for inputs of the same form but 
of opposite signs is valuable. Coherence tests may also be used to detect 
linearities in the frequency domain. 

Test of time invariance can be approached ad hoc by evaluation of time-variant 
properties from recordings obtained at different occasions. The presence of 
trends in the data, however, is often a benign time variation that may be 
avoided with trend elimination. 
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Test of disturbance conditions and noise properties should focus on a character - 
ization of the noise independent of the input. Artifacts such as abnormal data 
resulting from lost signals, outliers, and aliasing should be detected. Finally, 
test of the experiment condition involves an investigation to assure that input 
is correct and contains sufficient excitation with questions as to the presence 
of feedback and possible interference due to discrete-time control. 

Choice of input 

The form of the input is sometimes determined by the method, e.g ., with si¬ 
nusoids as used in frequency response analysis, step functions and impulses 
in transient analysis, or PRBS in correlation analysis. The choice of input for 
model-based methods relies on criteria of sufficient excitation for determina¬ 
tion of the spectral range of validity. 

Amplitude of the input is chosen as a trade-off between signal-to-noise ratio 
and nonlinearities. A large amplitude should be avoided when fitting a linear 
model so that the test signal does not enter a nonlinear region of behavior. 
However, the amplitude should be large enough to ensure a good signal-to- 
noise ratio. A statistical motivation for a large amplitude is that the attainable 
parameter accuracy is 


Cov{ 0} proportional to -- (8 55) 

input power ' 

A good rule of thumb for a minimum amplitude to choose is that the effect of 
the input on the output in a diagram need at least be perceptible to the eye. 

Frequency domain characteristics of test signals are that the input should be 
sufficiently exciting with an autospectrum S u ( ico ) that is not “too small.” In 
addition, the estimated input spectrum G u {ico) must not be “too small.” 


G u (ico) 


G yy 

G uu i.ito'j 


(8.56) 


An idea that is seldom possible to realize is the choice of an “optimal input” 
which generally requires a good knowledge of the transfer funtion that should 
be identified. 


Experiment duration 

The sampling frequency can only be chosen appropriately if there exists some 
knowledge about the dynamics of the system. An important problem is to 
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choose the sampling interval to avoid aliasing and interference with periodic 
discrete-time control. For a given sampling interval the “long” time constan s 
will appear as integrators in the model or—simply—as trends in data, whereas 
the “short” time constants may remain undetected. The experiment duration 
also affects both the attainable parameter accuracy 


Cov{ 6} proportional to exper i men t duration 


(8.57) 


and the spectral resolution. A reasonable rule of thumb is that the expenmen 
duration need be chosen £ 5 - lOx the longest time constant to be determined 
in the identification. Should the estimatated variance not decrease as the ex¬ 
periment duration increases, one might suspect that the parameter estima s 
are not statistically consistent. It should also be borne in mind that one ex¬ 
periment seldom results in knowledge of the transfer function over a wider 
frequent range than two or three decades. Finally, the experiment duration 
is also a trade-off between data economy and experiment economy. 

As a general conclusion it might be stated that the purpose of the model affects 
the choice of experiment condition. Also, the experiment conditions should be 
chosen similar to those under which the model is intended to be used. 


BIBLIOGRAPHY AND REFERENCES 

Sensitivity functions and H°°-robust control are described in 
- G Stein and M. Athans, “The LQG/LTR procedure for multivariable 
feedback control design.” IEEE Trans.Autom. Control, TAC-32, 1987, 
pp. 105-114. 

_ J.M. Maciejowski, Multivariable Feedback Design. Reading, MA: Addisor 
Wesley, 1989. 

An approach to identification theory directed toward support of robust control 
system design is found in 

G.C. Goodwin, M. Gevers, and B. Ninness, “Quantifying the error in 
estimated transfer functions with application to model order selection. 
Trans. Automatic Control, Vol. 37, 1992, pp. 913-928. 

Properties of identification in closed-loop operation are described in 

P E. Wellstead and J.M. Edmunds, “Least-squares identification of closed 
loop systems.” Int. J. Control, Vol. 21, 1975, pp. 689-699. 



Sec. 8.9 Exercises 


187 


— I. Gustafsson, L. Ljung, and T. Soderstrom, “Identification of processes 
in closed-loop — identification and accuracy aspects.” Automatica, Vol 
13, 1977, pp. 59-75. 

— B.D.O. Anderson and M.Gevers, “Representations of jointly stationary 
stochastic feedback processes”, Int. J. Control, Vol. 33, pp. 777-809 
1981. 

Several methods of identification of time-series are described in 

— K.J. Astrom “Maximum-likelihood and prediction error methods.” Auto¬ 
matica, Vol. 16, 1980, pp. 551-57., 

Generation of pseudorandom binary sequences by polynomial methods are 
described in 

W.W. Peterson, Error Correcting Codes. New York: John Wiley, 1961. 


8.9 EXERCISES 

8.1 Excitation signals—Sinusoids 

The transfer function of an unknown system is to be measured using 
a sinusoidal test signal. Both the creation of the test signal and the 
measurement are done using a computer. The sampled behavior of the 
computer will make the test signal differ from a perfect sinusoid. Deter¬ 
mine how many samples N are needed per period in order to make the 
signal close to a sinusoid. Interpret close as 

a. the energy in the strongest harmonic frequency component being 
less than 1% of the energy at the fundamental frequency. 

b. having more than 99% of its energy at the fundamental frequency. 

8.2 Excitation signals—White noise 

One problem with using white (broad-band) ncise as excitation signal 
is its amplitude. There is no guarantee that the signal level will be 
within fixed limits, and it is easy to get saturation of the process input 
signal. Suppose the input signal to a certain process is limited to the 
amplitude range -1.0 — 1.0. The input signal is chosen as a normally 
distributed white-noise sequence with zero mean and variance cr 2 . What 
is the largest cr 2 one can tolerate if the risk for input signal saturation 
should be less than 0.01? 
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8.3 Excitation signals - pseudo random binary sequence (PRBS) 

. PRBS is an easily generated signal with almost “white” properties. A 
PRBS is generated using feedback around a shift register. The length 
of the shift register determines the period of the sequence. A lengtn N 
register results in a sequence with period 2 N — 1. 

a. The sequences generated by shift registers of length 2 and 3, respec¬ 
tively, look as follows 


length 2 : , 1, —1,1 , 1, —1.1 , 

length 3 : ,1, — 1, —1,1» J.» 1» 1» It It It It It It 1* 

• v ..V . - v _ ^ 


(8.58) 


Determine the autocovariance functions C%(x) and Cz(r) for the two 
sequences. What do you think C#(f) looks like? 

Remark: Although the PRBS is a deterministic signal it is still possible 
to define the covariance function as 


„ N 

a = lim — Y u(t) 

^ N->oa N W 

1 N 

C uu (r) = lim jz Y («(< + t)-rn) ( u(t ) - mf 

N-+O0 Jy 


(8.59) 


where /i is the mean value. It is also possible to change the PRBS to a 
stochastic signal by introducing a stochastic phase (i.e. the signal form 
is known but the starting point in the sequence is unknown), 
b. Determine the autospectra S2(ico) and Ss(ico) of the two sequences 
in a. What do you think S^(io) looks like? 

8.4 If the PRBS in Exercise 8.3 is to be used as excitation signal it has to be 
fed out from the computer through the digital-to-analog converter. How 
does that change the spectrum of the signal? 

8.5 Coherence function 

The coherence function y yu between signals u and y is defined as 


Yyu(o)) 


|gyu(^)l 

v /S uu (ifij)S J ,y(i<y)' 


(8.60) 


Suppose we are going to identify the system G in Fig. 8.8. The input 
signal is u, the output signal is y, and n is a noise signal. Express S yu 
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Figure 8.9 Transfer function and coherence function in Exercise 8.6. 


in terms of G, S uu , and S nny such that it is obvious that y yu (co) can be 
used to judge if the excitation signal is sufficient for identification. 




Figure 8.8 Identification experiment in Exercise 8.5. 


8.6 Consider an identification experiment where a system was excited with a 
broad-band input signal. The transfer function was calculated using the 
spectra of the input u and output y. To be able to judge the quality of the 
estimate the coherence function y yu {co) was calculated. Both the transfer 
function and the coherence function are shown in Fig. 8.9. Using this 
diagram it was concluded that the system is of low pass type, and that 
it has some sort of resonance at the frequency 20. Is this conclusion 
correct? 
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8.7 Consider the system 

y(t) = 'b\u{t — 1) + b 2 u(t - 2) + b 3 ii(t — 3). (8.61) 

Determine if it is possible to identify all three parameters in the model 
by means of the input u{t) = sin(t). 

8.8 Consider the system 

S : yk + ayk-i = bu k -i + w k + cw k - 1 (8.62) 

where {u/*} is a zero-mean white-noise process with the variance £ { w\) = 
o’ 2 . Assume that the input u* = —Kyk so that identifiability of a and b 
is lost What are the asymptotic parameter estimates for large N when 
fitting the model 

M : yk = ay k -1 (8.63) 

with data from S. 

8.9 Consider the system 

S : y* + ay*-i = fcu*_i + w k (8.64) 

where {«>*} is zero-mean white-noise with “E{ WiWj } = a 2 8ij and u* = 1 
for all k > 0. Show that the input is persistently exciting of order 1 but 
not of order 2 and determine under what circumstances the least-squares 
estimate of a, b might give consistent estimates. 

8.10 Consider the system 

S : yk + ayk- 1 = w k + cw k -\, w k e ^(O.cr 2 ), £ {WiWj} = cr 2 <5y 

(8.65) 

where a = —0.9 and c = 0.7. How many samples would be required in 
order to achieve the variance 0.0001 of a when fitting the ARMA model 

M : yk + ay k -1 = w k + cw k - X (8.66) 

to data generated by the system (8.65). 

8.11 Assume that the production of a continuous-flow fermentation process 
can be modeled by the equations 


x = fix - li n X 

S = —RjlX + iin{Sin ~ s) 


(8.67) 
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where x is the produced biomass, s substrate concentration, s,„ influent 
substrate concentration, i in influent flow rate, R yield coefficient, // spe¬ 
cific growth rate. What identification problems can be anticipated and 
how should these be circumvented? Hint: This is model of growth. s 
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Residuals of 1 st order model Residuals of 2nd order model 




Residuals of 3rd order model Residuals of 4th order model 




Time lag [sj 


Time lag (sj 


Model Validation 


9.i INTRODUCTION 

The purpose of model validation is to verity that the identified model fulfills 
the modeling requirements according to subjective and objective criteria of 
good model approximation. A statistical approach not only involves hypothesis 
testing as to the complexity and model order of an estimated model, but also 
classification of models with equal model orders. 

It is usually a major objective to obtain a model of least possible complexity 
within the limits of required model accuracy. In particular, it is necessary 
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to distinguish between the lack of fit between model and data due to random 
processes, and that due to lack of model complexity. Mean square convergence 
is often a very attractive goal of validation because it unifies a statistical 
approach with a modeling and approximation approach. Hitherto, however, 
validation criteria have not quite accomplished this goal, and it is therefore 
not possible to advocate any such approach as orthodox procedure. Instead, 
many existing validation methods rely on various statistical assumptions and 
on statistical tests based upon such assumptions. For instance, model order 
tests are relevant for all identification of parametrized models and statistical 
decision criteria have been developed which take into consideration a number 
of aspects: 

o Decrease of loss function as the number N of observations increases 
(F-test). 

o Modified loss functions that penalize both model misfit and high model 
order. Examples of this type of criteria are the AIC, FPE, and MDL 
criteria presented in this chapter. 

o Redundant parameters appearing in the estimated linear models which 
may suggest that the chosen model order is too high. 

° Pole-zero cancellations and common factors appearing in transfer func¬ 
tions of linear systems. 

o A priori knowledge and physical considerations which may suggest a 
certain model order. 

o Model error and residual analysis. 

The residuals represent misfit between data and model, and the presence of 
any information remaining in the residuals is a clue that the model might 
be insufficiently complex or otherwise inappropriate. Residual analysis com¬ 
prises tests of such factors as: 

o Independence of residuals 

o Normal distribution of residuals 

o Zero crossings (changes of sign) of the residual sequence 
o Correlations between residuals and input 

Subsequent to validation by means of residual analysis, it is necessary to eval¬ 
uate the accuracy of the parameter estimates, for instance from the covariance 
matrix or by simulation. It is also relevant to check the following issues: 

o Does the variance decrease as N increases? If the estimated variance 
fails to decrease as the number of measurements grows, it may indicate 
that the estimate is not statistically efficient; 
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o Is the parameter accuracy sufficient for the purpose of the model? 

o Does stochastic simulation verify that the estimated model behaves qual¬ 
itatively as expected? 

° Does cross validation with data that have not been previously used verify 
that the estimated model is able to predict the behavior? This is often a 
very revealing test to pinpoint model anomalies. 

Several validation methods and procedures are described in this chapter, in as 
systematic a presentation as possible. The first section covers some necessary 
background for meaningful validation, i.e., that the method prerequisites are 
met. The role of coherence tests for examination of the signal conditions 
is described in Section 9.2. Structure determination by means of statistical 
tests is introduced in Section 9.3. Validation tests based on residuals are 
treated in some detail in Section 9.4 with the emphasis on tests of the assumed 
white-noise properties. Problems of model accuracy with some attention to 
physical parametrizations are discussed in Section 9.5. Finally, some methods 
for classification of the outcome of parameter estimation are presented. 


9.2 METHOD PREREQUISITES 

As model validation is predicated upon a correctly performed experimental 
procedure, it is necessary to confirm that the method prerequisites are ful¬ 
filled. Some of the relevant tests have been already mentioned in the context 
of experiment planning. Obviously, it is also a problem of validation to verify 
that these objectives have been met and that the method prerequisites allow 
for meaningful parameter estimation. An important check is to verify that 
the input has been correct and of sufficient excitation. Excitation properties 
can be investigated either by testing whether the persistent excitation crite¬ 
ria have been met, or by considering the autospectra of the input signals. An 
input spectrum with a non-zero level over a large spectral range generally 
ensures suitable experimental conditions offering good properties of identifi¬ 
cation. Prior knowledge about the presence of feedback is also important to 
document. As a whole, these tests provide a satisfactory check of the experi¬ 
mental conditions. 

Elementary signal conditions and the impact of various artifacts are neces¬ 
sary to consider at this stage, and problems due to outliers, aliasing, or lost 
signals should be detected and circumvented. Another source of problems is 
interference from discrete-time control in some subsystem with non-harmonic 
distortion similar to aliasing. 
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Linearity can be tested visually by regarding signal behavior for different 
input amplitudes and by considering the symmetry in response to negative and 
positive input. The coherence spectrum is also valuable as a test of linearity, 
as explained in detail below. 

Time-variant properties of the system under investigation are sometimes ob¬ 
vious from recordings obtained at different occasions. Simple time-variant 
properties such as trends can be elimininated by numerical differentiation of 
data or by subtracting some polynomial function of time fitted by linear re¬ 
gression. Another source of time-variant behavior is the effect of a non-zero 
initial condition with low damping. It should be borne in mind that an effect 
of low-frequency dynamics corresponding to a longer period than the measure¬ 
ment time may present itself as a trend. In such situations it is sometimes 
a task of identification to identify the low-frequency dynamics behind such 
trend behavior. 

However, the use of trend elimination sometimes calls for certain precautions 
and bookkeeping in order to allow for determination of equilibrium points, 
static gain, and for data restoration. 

Tests of disturbance conditions and noise properties can be made visually by 
examining records of data. It is usually valuable to determine v/hether noise 
is independent of input. 

Coherence spectrum and test of linearity 

The coherence spectrum was presented in Section 3.9 as a measure of de¬ 
pendence between two signals. An important use of the coherence spectrum 
is its application as a test of signal-to-noise ratio and linearity between one 
or more input variables and an output variable. Assume that there is some 
linear multi-input single-output relationship 

Y(s) = G 0 (s)U(s) + V(s) (9.1) 


that relates the output Y with the input vector U and some disturbance in¬ 
put V uncorrelated with U. Assume that the output y of a system has an 
autospectrum S yy (ico). The corresponding input u is assumed to have the 
autospectrum S uu (ico) and the cross spectrum between input and output is 
S yu (iio). The (quadratic) coherence spectrum between input and output is 
defined as 



(9.2) 
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where the autospectrum of the output is 

Syy(ico) = G 0 (ico)S uu (ico)Gl (-ico) + S uv (ico) (9.3) 

and the cross spectrum between input and output is 

Sy U (ico) = Go(ico)Suu(ico) (9.4) 

The (quadratic) coherence spectrum is then 

r (ico) = ( ic °) s uu _ Go(i(o)S uu {ico)Gl (— ico) _ 

yU S yy (ico) G 0 [i(o)S uu {ico)Gl{-i(o) + S vv (ico) 

(9.5) 

The coherence function expresses the degree of linear correlation in the fre¬ 
quency domain between the input u and the output y. It should be stressed 
that there is no immediate counterpart in the time domain— i.e., the inverse 
Laplace transform of T yu has no interpretation. In a special case with no 
disturbances where S vv = 0 it holds that 

V (iri\ — ^y u (i*®')^uu(j®)Suy(ico) _ Go(ico)S uu 

S„(Ua) - do(ia,)S..(ia,)QU-ia» = 1 (9 6) 

The coherence function may thus be viewed as a type of correlation function 
in the frequency domain where a coherence function not equal to 1 indicates 
the presence of one or more of the following: 

° Disturbance affecting y 

° Input not represented by u 

o Nonlinearity, so that there is no linear relationship between u and y, 
i.e., no transfer function between u and y 

° Non-zero initial values with low damping. 

A coherence test is valuable as a test of linearity and of the effect of distur¬ 
bance at an early stage of identification. The empirical coherence function is 
then computed as 


f (ia) = Syu(i<o)S u *(iG>)S„,(ia>) 


Syy(iO)) 


(9.7) 


A value of coherence close to 1 usually gives promise of successful identifica¬ 
tion. It also indicates in what frequency ranges there is a good approximation 
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Figure 9.2 Input power spectrum (upper left), output power spectrum (upper right), 
transfer function (lower left), and coherence spectrum (lower right) based on input- 
output data from the DC-motor drive of Example 7.6. The coherence function may 
be used to determine weighting (or filtering) of data in the frequency domain, with 
a view to suppress any region where disturbance and nonlinearity are prominent. 


ment. The motor drive with a moment of inertia J i is coupled to a load via 
a torsional flexibility with spring constant k. The load is assumed to have a 
moment of inertia J 2 and a damping d 2 . There is also a load moment v acting 
randomly on the system. The angular positions of the shafts are designated 
q 1 and qz, respectively. The motion equations are 


Ji'qi = k(q 2 -qi) + u 
Jz'qz = ~k(Q 2 - q\) ~ d 2 q 2 + v 


(9.8) 


with a transfer function according to Eq. (7.39) and a state-space realization 
according to Eq. (7.38). 

We use artificial data obtained by simulation of the system in Fig. 9.1 in order 
to allow for evaluation of the various methods by comparison with the correct 
values. Assume that the system is controlled and sampled with a sampling 
interval h = 0.1. The data collected from the input u and the output y thus 
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Table 9.1 Estimated coefficients with standard deviations and final prediction er¬ 
rors FPE (see for definition in Section 9.3) of various model orders for Example 9.1 


Model A-coe fficients 
order 

B-coeffidents 

C-coeflidents 

Estimated 

variance 

Akaike 

FPE 

n = 1 

1.0 

0 


1.0 


2.417 

2.431 


-0.1897 ±0.03 

0.7613 

±0.03187 

-0.901 

±0.01542 



n = 2 

1.0 

0 


1.0 


1.478 

1.495 


-0.2801 ±0.0375 

0.7965 

±0.03857 

-0.812 

±0.04952 




0.5767 ±0.02303 

0.26 

±0.05234 

0.05104 

±0.04705 



n = 3 

1.0 

0 


1.0 


1.026 

1.046 


-1.064 ±0.0381 

0.8102 

±0.03245 

-1.884 

±0.05084 




0.9233 ±0.03634 

-0.5814 

±0.06472 

1.127 

±0.0912 




-0.4001 ±0.02329 

0.2372 

±0.04354 

-0.09273 ±0.04885 



n = 4 

1.0 

0 


1.0 


1.004 

1.029 


-0.02846 ±0.05744 

0.7876 

±0.03189 

-0.871 

±0.06252 




0.05379 ±0.06262 

0.2397 

±0.05408 

-0.6626 

±0.1011 




0.4431 ±0.05577 

-0.2215 

±0.04507 

0.8384 

±0.07576 




-0.2879 ±0.03423 

0.3217 

±0.04106 

0.03014 

±0.04673 



n = 5 

1.0 

0 


1.0 


1.004 

1.035 


0.8249 ±0.07399 

0.7995 

±0.03203 

-0.02023 

±0.07774 




-0.03971 ±0.04821 

0.9377 

±0.05813 

-1.454 

±0.07542 




0.4387 ±0.03815 

-0.07859 ±0.06214 

0.2981 

±0.08395 




0.1173 ±0.04356 

0.04999 

±0.04758 

0.8129 

±0.07452 




-0.2868 ±0.03325 

0.3013 

±0.04329 

-0.04175 

±0.04696 



n = 6 

1.0 

0 


1.0 


1.004 

1.041 


0.2846 ±0.05793 

0.7836 

±0.03205 

-0.5541 

±0.062 




1.039 ±0.04985 

0.4629 

±0.04947 

0.0615 

±0.08111 




0.4828 ±0.08371 

0.6614 

±0.03544 

-0.1888 

±0.07864 




-0.1133 ±0.07164 

0.5287 

±0.04897 

-0.4304 

±0.0801 




0.3537 ±0.04634 

-0.1281 

±0.04217 

0.8315 

±0.06296 




-0.2833 ±0.03307 

0.3193 

±0.04173 

0.05853 

±0.04758 




correspond to data generated from a model with the discrete-time transfer 
function 

c • H (~) - B(Z) Q-848z 2 - 0.629z + 0.313 

' ' A{z) z 3 - 0.986z 2 + 0.3382 - 0.368 K '' 

obtained by means of a zero-order-hold assumption regarding the control in¬ 
put. The noise model is according to Fig. 9.1 with the transfer function 

C(z) __ z 3 - 1.8z 2 + 0.97z 

A(z) ~ z 3 - 0.986z 2 + 0.888z - 0.368 


(9.10) 
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where the coefficients of Eqs. (9.9) and (9.10) in turn corresponds to the 
physical parameters J\ = 0.1, J 2 = 0.1, k = 10, d 2 = 1. The continuous-time 
noise variance “E { v 2 ) = a 2 = 1 and E[u 2 ) - 1 so that the signal-to-noise ratio 
for the input variables is equal to one. The diagrams of input and output 
data shown in Fig. 9.1 give no reason to suspect the presence of artifacts 
in the form of signal loss, outliers, or aliasing; neither is there any obvious 
time-dependence, nor are any trends perceptible in input-output data. The 
coherence spectrum in Fig. 9.2 between input u and output y with coherence 
close to 1 for frequency below 0.0 Hz verifies that there is good coherence 
between the two signals for the middle and lower frequency ranges. In all 
likelihood, therefore, the noise spectrum is of significant magnitude mainly 
in the higher frequency ranges, and the input apparently provides sufficient 
excitation over at least two decades of frequency. 

Maximum-likelihood identification with N = 1000 samples of input and out¬ 
put provides parameter estimates according to Table 9.1 for model orders 
n = 1 — 6. The coefficients of the ARMAX model, their standard deviation, the 
estimated residual variance, and the final prediction error FPE (for definition 
see Section 9.3) are provided for all low-order models. In Table 9.1 it appears 
that the estimated residual variance does not decrease significantly for model 
order n > 3. 


9.3 MODEL ORDER DETERMINATION 

Unless a priori considerations in the field of application suggest that the model 
order is the correct one, an aim of identification may be to find at least an 
adequate if not the true model. As the correct model order is often not known 
a priori, it makes sense to postulate several different model orders. Based 
on these, one then computes some error criterion that indicates which model 
order to choose. One intuitive approach would be to construct ARMAX models 
of increasing order until the computed prediction error power reaches a min¬ 
imum. However, as shown previously all least-squares estimation loss func¬ 
tions decrease monotonically with increasing model order. Typically, adding 
parameters to the model reduces the sum of squares of residuals whereas 
adding parameters that do not reduce it by much may be of little value. Thus, 
the prediction error or the loss functions alone might not be sufficient to indi¬ 
cate when to terminate the search for the correct model, and it is natural to 
adopt a statistical hypothesis testing approach to this problem (see Appendix 
B). Consequently, the experimenter adopts the hypothesis that the higher- 
order model has no ameliorating effect and tests whether data will disprove 
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this. The decision should be supported by some test statistic, i.e., a function 
of data. It is standard procedure to formalize this reasoning as a null hypoth¬ 
esis and an alternative hypothesis 0{ A . The essence of a statistical test 
is a decision rule that tells whether the collected data call for accepting or 
rejecting certain model hypotheses. We illustrate these ideas by presenting 
the following model order test. 

F-test of system order n 

Assume that the system S is of model order no and is adequately described 
by po parameters 

S: Ay = Bu + Cw, we $t(0, <x 2 ) (9.11) 

Assume that there are two fitted models 9A.\ and 5^2 of model orders ni and n 2 
such that no < ni < n 2 and assume that the number of estimated parameters 
are pi and p2» respectively, with po ^ Pi < P2- Let the minimum loss functions 
for parameter numbers p\ and P 2 be denoted V" 1 and V 2 , respectively. 

First suppose that we are to test the null hypothesis ‘J-Cq that model £Wi is 
correct, a model that is included as a special case of model fW 2 . The alternative 
hypothesis would be that the true model is included in model 9vC 2 but is not 
as simple as Let it be assumed in both models that the noise components 
are independent and normal with constant variance af = a\ = a 2 . Now we 
test the model order by adopting the null hypothesis 

and erf = erf (9.12) 

which means that if a\ = erf then the simpler of the two models may be 
chosen. The alternative hypothesis is 

9{ a : !M 2 zd 5Vfi and erf > erf - (9.13) 

which is predicated upon the possibility of reducing the variance by choosing 
^2 instead of Slfj. Thus, if the hypothesis tests lead us to reject 9(q, then we 
conclude that fWj has insufficient model complexity. 

The loss function is also expected to decrease when the model order n increases 
beyond the appropriate system order. It is therefore relevant to ask, what is 
a significant decrease of the loss function V as the model order n increases? 
Now let N denote the number of observations. Under assumptions made in the 
context of prediction error methods that the loss function Vjv(#/v) represents a 
sum of squares of independent normally distributed random variables with a 
mean of zero, it follows from the Cochran theorem of statistics (see Appendix 
B), that 
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Table 9.2 Some percentiles for the x 2 -distribution. For degre es of fr eedom k > 30 
it is possible to approximate the percentile as Xp = 0.5 (z p + y/2k — l) 2 where z p is 
the corresponding percentile of the standard normal distribution 


Degrees of 
freedom 

^.005 

Zm 

Z 2 Q25 

Zse 

Z%5 

*.975 

^2 

*39 

^95 

1 

0.00 

0.00 

0.001 

0.004 

3.84 

5.02 

6.63 

7.88 

2 

0.010 

0.020 

0.051 

0.103 

5.99 

7.38 

9.21 

10.6 

3 

0.072 

0.115 

0.216 

0.352 

7.81 

9.35 

11.3 

12.8 

4 

0.207 

0.297 

0.484 

0.711 

9.49 

11.1 

13.3 

14.9 

5 

0.412 

0.554 

0.831 

1.15 

11.1 

12.8 

15.1 

16.7 

6 

0.676 

0.872 

1.24 

1.64 

12.6 

14.4 

16.8 

18.5 

7 

0.989 

1.24 

1.69 

2.17 

14.1 

16.0 

18.5 

20.3 

8 

1.34 

1.65 

2.18 

2.73 

15.5 

17.5 

20.1 

22.0 

9 

1.73 

2.09 

2.70 

3.33 

16.9 

19.0 

21.7 

23.6 

10 

2.16 

2.56 

3.25 

3.94 

18.3 

20.5 

23.2 

25.2 

20 

7.43 

8.26 

9.58 

10.9 

31.4 

34.2 

37.6 

40.0 

30 

13.8 

15.0 

16.8 

18.5 

43.8 

47.0 - 

50.9 

53.7 

40 

20.7 

22.1 

24.4 

26.5 

55.8 

59.3 

63.7 

66.8 

50 

28.0 

29.7 

32.3 

34.8 

67.5 

71.4 

76.2 

79.5 


i. 2(V rl - V 2 )ja 2 is X 2 iP 2 -Pi)-distributed under the model 

ii. 2V 2 /a 2 is x 2 {N - P 2 )-distributed under the model 2 . 

Hi. V 2 and ( V 1 - V 2 ) are independent under 

(9.14) 

Assuming V 2 and V 1 - V 2 to be asymptotically statistically independent (mj 
and using the ^-distributed properties (see Table 9.2) we can design a rele¬ 
vant test statistic. It is well known from statistical theory that a ratio of two 
X 2 - distributed variables is F-distributed with some degrees of freedom m\ 
and m 2 , respectively (see Table 9.3). A relevant test statistic for verification 
that the correct model order has been found is 

r F (P 2 ,Pi) = = Yl^l.EszB. ZF{N- P2 . P2 - Pl ) (9.15) 

C7| V 1 P2-P1 

which is an F-distributed variable with the degrees of freedom N — p 2 and 
P 2 - Pi under the null hypothesis P(q expressed in Eq. (9.12). The criterion 
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for rejection of the null hypothesis is 


t f > F a (N-p 2 ,p 2 -Pi) (9.16) 

where a is the probability that the null hypothesis FCo is rejected when it is 
true (i.e., the significance level of the test), and where F a is the cr-percentile 
of the F— distribution. 

Example 9.2—F-test of model order 

Consider the model of a DC drive in Example 9.1 and in Table 9.1. Tb test 
the hypothesis that the residual variances of two models are equal (i.e., the 
null hypothesis Mo), we compute the F-test statistic for N = 1000. As the 
ARMAX model of order n has p = 3n estimated parameters and as we choose 
to test the model order n i against n 2 = + 1 with pi = 3ni, p 2 = 3(m + 1), 

for ni = 1.5 we find by means of Table 9.1: 


Til = 1, 

0(6,3) = 

2417 - 1478 

1000 - 6 

1478 

6-3 “ 210 


0(9,6) = 

1478 - 1026 

1000 - 9 

rii = 2, 

1026 

9-6 ~ 146 

I! 

CO 

-- 

1026 - 1004 

1000 - 12 

1004 

« 7.21 

12-9 

II 

0(15,12) 

_ 1004 - 1004 

1000 - 15 

1004 

' 15-12 =° 

ni = 5, 

0(18,15) 

_ 1004 - 1004 

1000 - 18 

1004 

18-15 =° 


This procedure generates the decision table 


til 

n 2 


F 0 , 05 ( 00 ,3) 

accept 

$ 

b 

? 

CO 

accept 

1 

2 

210 

8.53 


26.1 


2 

3 

146 

8.53 

Xa 

26.1 

Xa 

3 

4 

7.21 

8.53 


26.1 

Xo 

4 

5 

0 

8.53 


26.1 

Xo 

5 

6 

0 

8.53 

Xo 

26.1 

Xo 


(9.17) 


Comparing the test statistics to the 95% percentile F 0 . 05 ( 991 , 3 ) = 8.53 we 
find that the null hypothesis Ffo for model order n x = 3 is not rejected, i.e., 
the third-order model is accepted. 


The Akaike information criterion (AIC) 

It is obvious from Table 9.1 that the estimated variance cr 2 and the loss func¬ 
tion decrease as the model order increases. Assume that the loss function is 
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Table 9.3 Percentiles Fo.oa(mi, m 2 ) of the F-distribution for degrees of freedom mi 
and m 2 , respectively. 


m 2 \mi 

1 

2 

O 

O 

4 

5 

6 

8 

10 

20 

30 

OO 

1 

161 

200 

216 

225 

230 

234 

239 

242 

248 

250 

254 

2 

18.5 

19.0 

19.2 

19.2 

19.3 

19.3 

19.4 

19.4 

19.4 

19.5 

19.5 

3 

10.1 

9.55 

9.28 

9.12 

9.01 

8.94 

8.85 

8.79 

8.66 

8.62 

8.53 

4 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.04 

5.96 

5.80 

5.75 

5.63 

5 

6.61 

5.79 

5.41 

5.19 

5.05 

4.95 

4.82 

4.74 

4.56 

4.50 

4.36 

6 

5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.15 

4.06 

3.87 

3.81 

3.67 

7 

5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.73 

3.64 

3.44 

3.38 

3.23 

8 

5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.44 

3.35 

3.15 

3.08 

2.93 

9 

5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.23 

3.14 

2.94 

2.86 

2.71 

10 

4.96 

4.10 

3.71 

3.48 

3.33 

3.22 

3.07 

2.98 

2.77 

2.70 

2.54 

20 

4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.45 

2.35 

2.12 

2.04 

1.84 

30 

4.17 

3.32 

2.92 

2.69 

2.53 

2.42 

2.27 

2.16 

1.93 

1.84 

1.62 

00 

3.84 

3.00 

2.60 

2.37 

2.21 

2.10 

1.94 

1.83 

1.57 

1.46 

1.00 


obtained from a least-squares model of order n with p estimated parameters 
On, and that the model is fitted with data from N samples 

o 1 * 

<x 2 = — Vn(On) = jj (On), On e R p (9.18) 

An interesting question is whether the loss function V can be replaced by some 
other relevant optimization index that also supports estimation of structural 
parameters such as model order or the number of model parameters. An 
optimization criterion that penalizes a high model order more effectively than 
the least-squares criterion function can be obtained by adding a term to the 
least-squares loss function such that the function grows as the model order n 
and the number p of model parameters increase. 

One attempt to include both the estimated variance and the model complexity 
in one statistic is the Akaike information criterion (AIC), which decreases 
as the residual variance <x 2 decreases and which increases as the number of 
parameters p increases. As the expected residual variance decreases with 
increasing p for nonadequate model complexities, there should be a minimum 
around the correct number p. Let logL(0) denote the log-likelihood function 
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Model order Model order 


Figure 9.3 Some model validation criteria evaluated for the DC-motor example: 
the estimated variance (upper left)-, the Akaike model order tests FPE (upper right) 
and AIC {lower left)-, the MDL test (lower right). All tests support that a third-order 
model is the appropriate model order. 

of 6 based on N observations. By optimizing a measure of distance between 
the “true” likelihood distribution and the observed one in the form of the 
Kullback-Leibler information 

J(e,6) = £{logL(0) -log 1 ,( 0 *)}. 6 eR p (9.19) 

it is possible to motivate the Akaike information criterion 

AIC(p) = log B 2 (6n) + ^, On e R p (9.20) 

which, if statistically consistent, would attain its minimum for the correct 
number of parameters. However, it can be shown the AIC is statistically 
inconsistent and gives an overestimated model order, that also motivates other 
criteria. An alternative is the minimum description length (MDL) statistic 
suggested by Rissanen. 

MDL(p) = log a 2 + log N -t- ^log|j0*||M 


(9.21) 
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where M is the Fisher information matrix. The MDL is statistically consistent 
as W oo as a criterion.for choosing the model order. 

The result of the AIC and MDL criteria applied to model order determination 
Example 9.1 is shown in Fig. 9.3. 

The Akaike final prediction error (FPE) 

It was noticed that the average prediction error is expected to decrease as 
the number of estimated parameters increase. One reason for this behavior 
is that the prediction errors are computed for the data set that was used for 
parameter estimation. It is now relevant to ask what prediction performance 
can be expected when the estimated parameters are applied to another data 
set. Clearly, it might be suspected that a large overparametrized model might 
poorly predict the behavior of a new data set. In order to analyze this sit¬ 
uation, we consider the expected prediction error based on the p parameter 
estimates 8 based on N data fitted to some linear model y* = <f>lO + Wk where 

•E{e 2 k (0)} =‘E[(yk-<t>lO) 2 ] = 

= £{ (y k - <ple) 2 } + E {{<ple - <t>le) 2 \ = ( 9 . 22 ) 

= £{«,*} +tr {‘E{ee T <l> k <t>l))~o 2 + a 2 ^ 

where the term <r 2 derives from the noise variance properties, whereas the 
contribution po 2 /N derives from the parameter errors. Thus, the asymptotic 
prediction error decreases as the number of observations increases, whereas 
the prediction error variance increases as the number of estimated parameters 
increases. However, the expected loss function based on the null hypothesis 
when estimating p parameters based on N observations is 

E{^(^)} = \o 2 {N-p) (9.23) 

which tends to decrease as the number of parameters increases and the final 
prediction error criterion (FPE) is estimated as 

FPE (p) = o‘(l + (9.24, 

Hence, the quality of identification measured as expected prediction accuracy 
can be improved by introducing new parameters to be estimated as long as 
each new parameter can be accurately estimated. Thus we choose 


p = argminFPE(p) 


(9.25) 
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where the second factor increases as the number p of estimated parameters 
increases. The FPE criterion consists of choosing the model corresponding 
to the minimum FPE as the final prediction criterion and the corresponding 
number of parameters p and a corresponding correct model order. It is, how¬ 
ever, sometimes observed that the FPE tends to underestimate the correct 
order of a system. 

The decision criteria AIC, MDL, and FPE all include the model order or the 
number of parameters as a model parameter in the loss function to minimize. 
An attractive property is that the min i m um of the test statistics indicates what 
model to choose so that these test statistics need not be compared against some 
statistical significance level; see Fig. 9.5. 

In addition to the methods presented, there exist consistent criteria such as 
the Wald statistic for order estimation based on rank conditions. Another 
method for model order determination is based on singular value decompo¬ 
sition, but discussion of this must wait until the following chapter on model 
reduction. 


9.4 RESIDUAL TESTS 

Linear prediction error models of a system S are generally based on some 
transfer function relationship Y(z) = H u (z)U(z ) + H w (z)W(z ). The residuals 
obtained as 

*(*) = H-\z){Y{z) - H u {z)U{z)) (9.26) 

represent a disturbance input or innovations that would explain the mismatch 
between the observed data and the behavior of the estimated model. A se¬ 
quence of residuals which still exhibits some structure would then indicate 
that either the modeling or the identification is not complete. If the model is 
correct and if the method prerequisites are satisfied, then the residuals should 
be structureless; in particular, they should be uncorrelated to any other vari¬ 
able including inputs and outputs. This is the assumption upon which the 
tests known as residual tests are based. 

A simple check is to plot the residual versus the fitted values; such a plot 
should not reveal any obvious pattern. Another valuable diagram is the his¬ 
togram of the residual amplitudes, which reveals distributions that differ from 
normal distributions (see Fig. 9.4), and is a valuable complement to analysis 
of variance. 
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£ 

a 

a 


Residual magnitude Residual magnitude 

Figure 9.4 Histogram of residuals for DC-drive models of model orders n = 1-4. 

The normal distribution !A£(0 ,a 2 ) (dotted line) is shown for comparison. 

The following tests (as formulated here) apply to time-invariant systems and 
are based on statistical analysis of the residuals {£*(#n)}- The null hypothesis 
% is 



i. 

ii. 

iii. 

iv. 

v. 


{£*} constitute a white-noise process with mean 0 
{£*} are normally distributed 
{£*} are symmetrically distributed 

{£*} are independent of previous inputs with = 0; i > j 

{£*} are independent of all inputs ‘Efeuj] = 0; if there is no feed¬ 
back 


This null hypothesis 9~[q for the residuals can be used for several statistical 
tests of which we present the autocorrelation test, the cross-correlation test, 
test of normality, and test of the number of zero crossings (changes of sign). 
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Autocorrelation test 


Consider the autocovariance function of residuals 

1 N 

°cc (t) = - ]T £ k £ k-x (9.27) 

k = T +1 

and define the vector of residual autocorrelations as 

r « = ( C„(l) ... C ££ (m) ) r (9.28) 

for some number m. Under the null hypothesis and according to the central 

limit theorem this statistic is asymptotically distributed as 

'/Nr ££ 9i(0, J mxm ) (9.29) 

An autocorrelation test statistic can be formulated as the quantity 

r ££ = NrJ c r ee j 2 (m) (9.30) 

which can be tested with standard analysis of variance. The null hypothesis 
#o assumes that the residual mean is zero, which helps to avoid a reduction 
of the degrees of freedom of the test. The decision criterion based on the 
null hypothesis is 

r ee > zl(m), reject 9{ 0 

,, , (9.31) 

T ee < Za( m )< accept 9{ 0 

where cc is the significance level. Rejection of 9-Cq should imply that the model 
associated with is refuted. 

Example 9.3—Validation of a DC-drive model (cont’d.) 

Computation of the autocorrelation test statistic for tn = 50 gives the values 


n 

*ce 

1 

802.2 \ 

2 

70.9 

3 

40.9 

4 

45.5 

5 

44.4 

6 

46.8 

7 

42.3 

8 

41.8 

9 

36.6 

uo 

44.2 J 


(9.32) 
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Residuals of 1st order model 



0 50 

Time lag [s] 



0 50 

Time lag [s] 




Time lag [s] 


Figure 9.5 Residual autocorrelations for some model orders n = 1-4 with 95%- 
confidence interval limits (dotted line) and 99%-confidence interval limits (dashed 
line) for noncorrelatcd residuals. 


The percentile ;^ 95 (50) = 67 - 5 and N = 1000 ’ vvhich meanS that the ab ° V6 
test statistic r ££ < 67.5 suggests that the null hypothesis should be accepted 

for n > 3. * 

A standard way to present this decision problem in the case of interactive iden- 

tification is to show a diagram of r £C \ see Fig. 9.5. The 95%-confidence interval 
for the asymptotic distribution of each component is [-1.96/VN, 1.96/v 77], 
which is often drawn in the same diagram. A test of normality for each of the 
components of r ££ [k), 1 < k < m can then be suggested as follows. If r ee (k) 
is within the indicated interval, then the null hypothesis 9/o can be accepted 
for that value of k. Moreover, if the whole function r ££ is within the indicated 
margins, then the null hypothesis M o can be accepted (or more correctly, i.e., 
it is not rejected). 
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Residuals of 1st order model 



Residuals of 2nd order mode! 



Residuals of 4th order model 



Figure 9.6 Residual cross correlations for some model orders n = 1-4 with 95%- 
confidence interval limits (dotted line) and 99%-confidence interval limits (dashed 
line) for noncorrelated residuals. 


Cross-correlation test 

A similar test for the independence of input u and residuals is based on the 
cross-covariance function 

1 N 

C U i (t) = pj ~ z (9.33) 

k~ T 4 1 

Let m be the time interval (expressed as a number of samples) over which 
residual correlations should be investigated and define the vector 
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and the matrix 


Ruuipi) 

Under the null hypothesis #o it is straightforward to verify that the asymp¬ 
totic distribution is 

'/Nr ue d ^ 9{,(0, R uu ) (9.36) 

Thus, we can formulate the cross-correlation test statistic 

T ue (m) = Nrl c R~lr ue * 2 (m) (9.37) 

All these test quantities can be used for statistical hypothesis testing. The 
null hypothesis 9fo is predicated upon a residual mean of zero, which helps to 
avoid a reduction of the degrees of freedom of the % 2 test. 

It is also valuable to inspect r u€ for negative m where non-zero values indicate 
that the residuals affect future inputs via some feedback mechanism (or that 
the system or model is noncausal in its behavior). The cross-correlation test 
may thus be used as an indicator of feedback. 

Example 9,4—Validation of a DC-drive model (cont'd.) 

Computation of the cross-correlation test statistic for m = 50 gives the result 


1 N 

— T. 


N-m 


k-m +1 


Uk-l 


^ ttk-m J 


U k -1 ... U k - 


(9.35) 


n t u£ 

1 127.7 \ 

2 164.4 

3 65.1 

4 63.1 

5 60.6 

6 56.9 

7 59.9 

8 56.4 

9 57.5 

\10 59.1 


(9.38) 


The percentile 20 . 9 s ( 50 ) = 67.5 suggests that the decision criterion for rejec¬ 
tion of the alternative hypothesis based on the cross-correlation test statistic 
should be r u£ < 67.5 — i.e., that the null hypothesis should be rejected for 
n <2 and that a third order may be adopted. 
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Results of these computations are often presented in a manner similar to that 
of the autocorrelation with 95%-confidence intervals [-1.96 /%/jV, 1.96/v/iV] 
(see Fig. 9.6). If r uc (k) is within the indicated interval then the null hypoth¬ 
esis can be accepted for that value of k. If the whole function r uc is within 
the indicated margins then the null hypothesis can be accepted. S 

A common source of systematic failure of this test for all model orders is the 
presence of artifacts in data. A common defect is that one residual is very 
much larger than any of the others. Such a residual can be identified as an 
outlier and generally causes problems in the case of least-squares estimation. 
Problems due to outliers should be circumvented by eliminating such data 
before parameter estimation takes place. 

Example 9.5—Validation of a DC-motor model (cont’d.) 

Assume that data with input u and the output y are generated from a model 
with the transfer function 


B(z) 0.848z 2 - 0.627z + 0.313 

S : H u (z) - ~ z 3 _ o.986z 2 + 0.888z - 0.370 

a noise model according to 

_ C(z) _ z 3 - 1.8z 2 + 0.97z 
wK ) ~ A(z) z 3 - 0.986z 2 + 0.888z - 0.370 


(9.39) 


(9.40) 


and a noise variance '£ { w 2 } - a 2 - 1. Diagrams showing input and output 
data are found in Fig. 9.1. Maximum-likelihood identification with N = 1000 
samples of input and output provides parameter estimates according to Table 
9.1 for various model orders. The loss function ( = estimated variance) is also 
available in Table 9.1, and is a continuously decreasing function of the model 
order n of the estimated model. The F-test gives a clear indication that there 
is no significant change of the loss function for a model order higher than 
n = 3. Thus far the validation procedure supports the choice of a linear 
model of model order n = 3. ■ 


Test of normality 

One test of normality has been encountered already in the context of auto¬ 
correlation and cross correlation with a 95%-confidence interval indicated in 
the autocorrelation and cross correlation diagrams. Equation (9.29) can thus 
be used for design of tests of normality of the residuals. Also, the autocorre¬ 
lation test statistic (9.30), which for large numbers m is well approximated 
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Figure 9.7 Empirical distribution of residuals for DC-drive models of model or¬ 
ders n = 1 — 4 (solid line). The corresponding normal distibution (dotted line) is 
indistinguishable from the empirical distributions except for n = 1. 

by a normal distribution tA£(/n,2m), can be viewed as a test of normality, see 
Fig. 9.7. 


A straightforward test of whether the distribution of residuals is normal is 
offered by the Kolmogorov-Smirnov test. The difference between the residual 
distribution function obtained and the assumed normal distribution function 
can be used as a statistic in determining whether or not to accept the null 
hypothesis as correct 

t K s = sup|F c (x) - F f (x)|, (9.41) 

X 

where the empirical distribution function is 



X < £ (1 ) 

£(k) < X < £(k+l)> 


k = 1,2, ...,1V 


1 


(9.42) 


and {£(*)} is a permutation of the residuals {£*} by sorting the components of 
the residual sequence in ascending order of magnitude. There are asymptotic 
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Model order Model order 



Model order Model order 

Figure 9.8 Test statistics and rejection thresholds for the autocorrelation, cross- 
correlation, zero crossings, and Kolmogorov-Smimov tests. Confidence intervals at 
significance level p < 0.05 (<dotted line) and p < 0.01 (dashed line) are indicated. 


formulae for sample sizes N > 100 


Significance level 

Acceptance limit 

p = 0.05 

1.36 

T * s - 7N 

p = 0.01 

1.63 



(9.43) 


Example 9.6—Validation of a DC-drive model (cont’d.) 

The Kolmogorov-Smirnov test statistic was computed for various model or¬ 
ders, all of which fall within the margin of acceptance as shown in Fig. 9.8. 
Obviously, this statistic does not provide any means of distinguishing between 
models of different orders in this example. ■ 
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Zero crossings 

Consider the number of zero crossings of the residuals 

N -1 

1 

where a zero crossing Xk at time k is calculated as 

** = {o! 


if E k £k+ 1 < 0 
if £ k £k+i > 0 


( 9 . 44 ) 


(9.45) 


Assuming the .x* to be independent variables which take on the values 0 and 
1 with equal probability under the null hypothesis frfo, the test statistic r x for 
large N is then asymptotically distributed as 


nf ,N N. 

r, e «( T . T ! 

A two-sided test for zero crossings at the significance level 0.05 is 


-1.96 < 


tx-N /2 

N 
4 


< 1.96 


(9.46) 


(9.47) 


or 


N 


- 1.96i 


N ,[N 
< — + 1.96 W — 


(9.48) 


IN 

2 — - ¥ 4 <T - < Y + 1 - yo \/ 4 

which leads to a test criterion stating that the number of zero crossings for 
N = 1000 with 95% probability should be in the interval [459,541]. 


Example 9.7—Validation of a DC-drive model (cont’d.) 

The result of the zero crossings test applied to the model of the DC-motor 
drive is shown in Fig. 9.8. The estimated variance, AIC, FPE, and MDL as 
shown in Fig. 9.3 all point to the choice of a third- or fourth-order model. The 
following confidence intervals are given for the number of zero crossings 


(9.49) 


The number of zero crossings, the autocorrelation, and cross-correlation statis¬ 
tics shown in Fig. 9.8 all indicate that the third-order model is appropriate. 
The Kolmogorov-Smirnov test statistic does not distinguish between the dif¬ 
ferent model orders. Also, the residual correlation tests in Fig. 9.5 and Fig 
9.6 indicate that the first- and second-order models are insufficient to explain 
data but that the third-order model is the appropriate model order. ■ 



' p < 0.05* ' 


r 469 531 ' 

Confidence intervals 

p < 0.01" 

= 

459 541 


r — 

A 

O 

b 

o 

►-* 

* 

* 

* 


k 448 552 , 
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dia fl m ’ noise spectrum > and zero-pole diagram of the DC-drive 
system (solid line ) and the estimated model (dashed line). The zero-pole diaeram 
contains the po,o s (V), and the „„, s (w ,v) of L B- and C-polynoiSaTs^iS” 


MODEL AND PARAMETER ACCURACY 

Several of the statistical methods already presented have been helpful in dis¬ 
tinguishing the correct model order from other more or less correct models 

r tf UnS ** shown that the m °dei chosen according to these 
methods is sufficiently accurate for the purpose of modeling. It is therefore 
mdispensable to consider the model performance and behavior in comparison 
wi re ata. t east three different methods are relevant in this context. 

First, a stochastic simulation, where both the deterministic input and the 
residuals of identificatmn are used as inputs, is an effective means of checking 
whether the model reproduces the observed data. This test should give a close 
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Figure 9.10 Impulse and step responses for a DC-motor system (solid line) and 
the identified model (dashed line) in both discrete time (upper) and continuous time 
(lower). 


fit between observed and simulated data. A poor reproduction of data may 
indicate that the numerical procedures of parameter optimization have failed. 

Second, a deterministic simulation can be used, where real data are compared 
with the model response to the recorded input signal used in the identifica¬ 
tion. This test should ascertain whether the model response is comparable to 
real data in magnitude and response delay. This test sometimes fails despite 
promising results having been obtained in previous statistical tests. Such a 
result makes sense in a case where the identification has determined an ad¬ 
equate stochastic model but where the input-output behavior has been com¬ 
promised. A failing deterministic simulation often indicates that the input 
amplitude during experiments is inadequate; so the experiments that gener¬ 
ate data may have to be redesigned. Another source of problems is model 
complexity, where a too simple model tends to give poor input-output behav¬ 
ior. Thus, recourse to nonlinear models, etc., often appears to be justified at 
this stage of identification. 
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Third, a cross-validation simulation can be made by applying input data not 
previously used in the identification. The comparison between the observed 
data and the model output is usually very revealing with regard to model 
anomalies not previously detected. A residual analysis applied to the misfit 
between model and data is also valuable in order to determine whether the 
model complexity is adequate. Failure to pass the cross-validation test may 
also indicate that the system is not time-invariant. 

A problem that appears in physical modeling is whether the estimated model is 
compatible with a priori knowledge about the system behavior and its param¬ 
eters. It is valuable in this context to evaluate information from the zero-pole 
diagram (see Fig. 9.9), and the model step and impulse responses (see Fig. 
9.10). 

Another problem is to ascertain whether the estimated parameters are sta¬ 
tistically consistent. An empirical approach to this problem is to consider 
the parameter variance estimates as the number of observations used in the 
identification increases. A decrease of the estimated parameter variance as 
N increases supports the choice of model structure, whereas a constant or in¬ 
creasing parameter variance may indicate that the parameter estimates are 
statistically inconsistent. Theoretical approaches to justification for claims 
of consistency are based on normality assumptions and/or the central limit 
theorem both of which have a limited scope of application. 

Continuous-time transfer functions and state-space realizations of the type 
(7.38) are examples of physical parametrizations. It is sometimes of inter¬ 
est to monitor individual physical parameters by means of identification. A 
problem in this context is that the physical parameters are present in the 
transfer functions as aggregate parameters which, of course, complicates in¬ 
terpretation of the estimated coefficients. For the same reason it is difficult 
to calculate precise estimates of the uncertainty of the physical parameter 
estimates obtained. 


Example 9.8—Validation of a DC-urive model (cont’d.) 

A deterministic simulation (Fig. 9.11) of the estimated model, using the 
recorded input as input to the model, shows reasonable agreement between 
the simulated output and the recorded output despite the unfavorable signal- 
to-noise ratio ( S/N = 1). Also, the stochastic simulation in Fig. 9.11 with 
a noise input chosen as the computed residuals provides good agreement be¬ 
tween original data and the behavior of the estimated model. 
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Figure S.I1 Deterministic and stochastic simulations (dashed line) of the recorded 
output by applying the recorded input sequence (and the residuals as noise input) to 
the estimated model. The recorded output is shown for comparison (solid line). 


The discrete-time estimated model with sampling period h = 0.1s is 

, _ 0.8102z 2 - 0.5814z + 0.2372 

' Z) ~ z 3 - 1.0643z 2 + 0.9233z - 0.4001 U ' Z > 
z 3 - 1.8839z 2 + 1.1267z - 0.0927 
+ z 3 - 1.0643z 2 + 0.9233z - 0.4001 


Transformation of this discrete-time model to a continuous-time model gives 


a, . _ Y(s) 8.57s 2 + 102s 2 + 836 
(S ~ U(s ) _ s 3 + 9.16s 2 + 192s + 824 


(9.51) 


Matching this estimated transfer function with the analytically derived trans¬ 
fer function from a physical parametiization (7.38) would give 

q( \ _ r(s) _ _ J 2 s 2 + d 2 s + k _ 

U(s) JxJ 2 s 3 + Jid 2 s 2 + k(Ji + J 2 )s + kd 2 (9 52) 

10s 2 + 100s + 1000 


s 3 + 10s 2 + 200s + 1000 
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for 


' J\ ' 


01 

d 2 


0.1 

k 


10 

w C?2 * 


. 1 , 


(9.53) 


The transfer function (9.52) is clearly a lumped parameter system that gives 
an overdetermined system of six nonlinear equations and four unknown vari¬ 
ables J u J 2 , k, and d to solve. A least-squares solution to this problem is 


'Ji' 

J2 
k 

which shows a close relationship to 
rameters. 

An unsatisfactory point is clearly the complicated translation of the estimated 
parameters into physical parameters, and problems of this type are current 
issues in research. Some of these problems are addressed in the following 
chapters on continuous-time models and nonlinear identification. 


( 0.111 
0.100 
9.39 
l 0.984 


(9.54) 


exist between original and estimated pa- 


9.6 CLASSIFICATION WITH THE FISHER LINEAR DISCRIMINANT 

A standard problem of applied parameter estimation is that a parameter set 
6 may belong to any of a number of sets. Common examples are the classifica¬ 
tion of time-varying noise dynamics or changes in operating conditions which 
give rise to a number of different parameter sets. In such circumstances, it is 
therefore relevant to find criteria to verify whether 6 belongs to an expected 
set or not. 

Consider the problem of classification of a parameter estimate 6 that jnight 
belong to either of two classes SA. and ( B. Assume a parameter estimate 6 to be 
an estimate of either the parameter vector 6% e SA or 6$ e < B. The two classes 
SI and CB may each contain elements with a certain parametric variability. 
Assume the mean m and the covariance R to be 

£{<?,} = mt, and *E{(<?,• —mj)(0/— m,-) T } = Ri, i = SI or t B (9.55) 

One means of distinguishing between the two classes is the difference of the 
sample means. If we form a linear combination of the components 8 of the 
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form X T 0, and we want the projections of 0 from different classes falling on 
the line defined by X to be well separated and the separation to reflect the 
sample means as well as the average variance 


R = ~(R a + i? s ) 


(9.56) 


then we can compare and classify the projected samples using a single number, 
lb accomplish this we must determine a hyperplane X that separates the two 
classes such that a given element can be classified either as belonging to Si or 
It is also of interest to use X to determine a measure of separation // 


H = X T (m<B-m<i) 


(9.57) 


To determine X, we minimize the function 


J(X) = 


(X T (m v -m A )) 2 
X T RX 


(9.58) 


where the denominator serves to make J large with respect to the scattering 
of the projected samples. The Fisher linear discriminant is then 


X = R l (m<s — m$) 


(9.59) 


and a parameter vector 6 may now be classified by calculation of 

X T 6 = (m% - mn) T R~ l 9 


(9.60) 


Classification according to the Fisher linear discriminant is done by testing 


if X T 6 > d + S 


& 6 S Uncertain if \X T 6 - i?| <8 
^ \iX T e<9-5 


(9.61) 


for some threshold i? and a region of uncertainty parametrized by 8. A possible 
choice of the threshold & is 


^JJR^X) + 7(P\R^I) 


(X T RzX)X T ma + J(X T R>bX)X t m-s) (9.62) 


The threshold value is then chosen in such a way that its distances to X T m% 
and X T m<B are inversely proportional to the standard deviations as determined 
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Fisher linear discriminant 



Figure 9.12 Fisher linear discriminant to separate two sets. The samples pro¬ 
jected onto the line determined by the Fisher linear discriminant appear to be well 
separated. 


by R*, i and of the two classes, respectively. The value of S may be cho¬ 
sen so that classification is avoided for elements appearing in the interval of 
overlapping of the two distributions. 


Example 9.9—Fisher linear discriminant 

Consider the parameter classes Si and $ with mean values 


m* = 




and the covariance matrices 


R& = 


( 


1.5 

-0.5 


-0.5 \ 

1.5 J ’ 


and 


Rq 


-(£ 


1.5 

5 


0.5 ' 
1.5, 


(9.63) 


The Fisher linear discriminant to separate the two classses !A and ‘B is 



(9.64) 
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and the threshold chosen is 

t? = 0.4575 (9.65) 

As shown in Fig. 9.12, the Fisher linear discriminant is effective in separating 
observations from the two classes. The samples and their projections falling 
on the Fisher linear discriminant are well separated except for two samples 
which fall within the uncertainty interval. H 

The method can be extended for classification into several classes. 


9.7 *THE CONCEPT ‘IDENTIFIABILITY* 

It was shown in Examples 8.1 and 8.2 that there exist experimental conditions 
under which a system S cannot be uniquely identified. From different points 
of view this problem can be regarded as a problem of parametrizaticn or as 
a problem originating from a poorly conducted experiment, both aspects are 
sometimes discussed in terms of identifiability —a notion which has been a 
point of long and, as yet, unfinished discussion. 

The question of whether a certain experiment is sufficiently informative for, 
say, estimation of p parameters can be approached by reference to the notion 
of excitation (see Chapter 5). The problem arising in Examples 8.1 and 8.2, 
for instance, can be explained by the fact that the regressor matrix does not 
have full rank. For the remaining part of this section, we assume that the 
experiment has been performed with an excitation that is sufficient relative 
to the desired model complexity. Thus we avoid further interpretation of poor 
identifiability due to lack of excitation. 

Instead of postulating a set of linear systems, a model set Chi can be de¬ 
termined from physical modeling or—sometimes—from the gross behavior of 
data. For instance, the observation of an output delayed relative to the input, 
or the observation of an oscillative behavior, may be sufficient to suggest a 
certain model set. Such a practice, sometimes called structural identification , 
is predicated upon the idea that a model set containing as yet unspecified 
parameters can be postulated. 

Identification methods are based on specific choices for a model set Chi , its 
parametrization, and an identification method I determined by an optimiza¬ 
tion criterion. A particular model of the set Chi , and described by a parameter 
vector 9 may thus be designated CM( 6 ). The parametrization is said to be 
unique only if, for two parameter vectors 9 \ and 62 , it holds that 

CM{9 Y ) = 9/ L {9 2 ) => 0i = 02 


(9.66) 
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An example of a unique parametrization is transfer function modeling of a 
single-input single-output system by means of a rational function in the inde¬ 
terminate s with co-prime numerator and denominator polynomials so that 

=> [6 :S = M(6)} = {0i}, 0i e R p (9.67) 

where 8 \ is the unique parameter vector that parametrizes S as a model of 
the set 

Identifiablity is thus a property of a parametrization assuming that there is a 
unique a priori system representation which, of course, is independent of the 
experimental procedure. Identifiability in this form can be used as a notion 
to describe the ability to correctly estimate parameters in some process and 
is thus closely related to the idea of a unique model and an associated unique 
minimum of an identification criterion. Given a parametrized model set fW 
with a unique parameter vector 6 such that S = M (#)> it is necessary to 
formulate a suitable identification criterion and by extension an identification 
method that enables the estimation of 6 . The system is said to be identifiable 
if it is possible to design such a procedure. Prediction error methods applied 
to single-input, single-output ARMAX models constitute one such category of 
procedures, in which the sample covariance matrix of the prediction error is 
minimized according to a suitable scalar optimization criterion. In this form, 
identifiability is closely related to to consistency properties of an estimation 
method. In fact, such identifiability can be regarded as a set of stipulative 
definitions of the conditions that imply consistency. 

Assuming identifiability thus leads to powerful statements on consistency that 
are valuable for the analysis of limiting properties of parameter estimates. As 
assumptions on identifiability may also be restrictive, however, it is difficult 
to approach identification of state-space models and multi-input, multi-output 
systems for which, in general, there exists no unique parametrization. 

Identifiability concepts also affect the validation procedure. Traditional ap¬ 
proaches to solving the validation problem involve evaluation of the model 
misfit in the form of a residual sequence, assuming that the residual time 
series is a realization of a stochastic process. Statistical hypothesis testing of 
the stochastic nature of the residuals is then used, and the model is rejected 
when it has been refuted by data in a number of such statistical tests. 

This circumstance raises yet another problem, as several models suggested 
for explanation of a data record will be unable to fit data in the statistical 
sense. In particular, it is necessary to distinguish between the lack of fit 
between model and data due to random processes and that due to lack of 
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model complexity. It is, of course, also relevant to provide methods which give 
accurate results in the sense of some approximation criterion. This is a topic 
of Chapter 10. The combined problem of simultaneous model approximation 
and stochastic modeling remains to be solved. Among the methods already 
available, output error estimation seems to address this problem in the most 
straightforward manner. 


9.8 CONCLUDING REMARKS 

A difficult aspect of structure determination is to obtain a meaningful opti¬ 
mization criterion that provides a good compromise between simplicity and 
complexity, yielding a model that gives a better fit to data. The statistical 
tests presented are formulated as decision problems at a given significance 
level a, say 95%. Acceptability of a model is usually formulated as the deci¬ 
sion problem of accepting or rejecting the null hypothesis It is important 
to bear in mind that a serious problem is the risk of accepting when it 
is not true, i.e., when the alternative hypothesis should be accepted. As no 
statistical properties are tied to the alternative hypothesis, it is impossible 
to quantify the risk of accepting the null hypothesis, i.e ., the risk of accept¬ 
ing a wrong model. This is clearly an inherent weakness of the statistical 
tests presented. Acceptance of models based on statistical methods only must 
therefore be discouraged as there is no way of verifying the validity of the de¬ 
cision. For this reason it is important to pay some attention to the simulation 
performance and other methods that are of a different nature from those of 
statistical tests. 
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9.10 EXERCISES 


9.1 Show that the covariance function of the least-squares estimate of the 
parameters of the linear model 9 n - ^nO + e with T{e] =0 and 
r E{ee T ] = E c can be expressed as 



<1>n ) - 1 fX e 01 ( I 

0 J [ 0 0 J [ 


<&N 

0 


-1 


(9.68) 


Hint: Use the results previously derived for the augmented system ma¬ 
trix (5.104) and (5.117). 

9.2 Determine the covariance between £/v and On for a least-squares esti¬ 
mate ( cf. Exercise 9.1). ■ 
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Model Approximation 


10.1 INTRODUCTION 

Model approximation and model reduction refer to methods for simplification, 
approximation, and order reduction of models of dynamic systems. Model ap¬ 
proximation is of importance for extraction of dominant features of a model 
or to reduce a high-order time-series model to a lower-order structure moti¬ 
vated by physical considerations or for evaluation of the relative importance 
of subsystems in some large-scale system. 
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The term model reduction may cover several aspects such as (2) model order 
reduction in a linear system or (2) model approximation of a nonlinear dif¬ 
ferential equation by a linear system or (3) approximation of the nonlinear 
system by ignoring higher-order harmonics. 

The first aspect is elaborated and we start this chapter by presenting some 
problems associated with heuristic model reduction methods. Then we proceed 
by stating some systematic methods of model reduction based on balanced re¬ 
alization. Other methods such as the Pade approximation, moment matching, 
and continued fraction approximations are presented with some attention to 
their shortcomings and drawbacks. 

The second and third aspects above can be approached by various types of 
linearization techniques, e.g. , standard linearization and harmonic lineariza¬ 
tion. Model approximation of nonlinear systems is presented in the form of 
linearization and describing function analysis. Finally, the chapter includes a 
perspective on the use of model approximation methods in identification. 

We start by considering some familiar model approximation methods that are 
basic in system analysis, i.e., linearization and discretization. 

Linearization 

A fundamental method is linearization of nonlinear differential equation 

x = f(x, u) (10-1) 

which is replaced by the approximate equation 

X « fo + fx{x 0, U 0 )(x - x 0 ) + fu(x 0, Uo)(« - «o) (l°-2) 

obtained from a truncated Taylor series expansion around a linearization point 
(xo, Mo)- The partial derivatives f x - df/dx and f u = df/du are evaluated at 
the point (xo, uq). 

Discretization 

Discretization of system dynamics is a form of approximation where the ap¬ 
proximated system and its approximate are equal or close at the time instants 
as defined by the discretization or measurement. A linear system with input 
u{t) constant between the sampling instants, i.e., step-response equivalenceor 
zero-order-hold input, can be discretized in time as 

( x(t) = Fx(t) + Gu(t) ( Xk+ 1 = <3>x* + 

\y(t) = Cx(t) ^ \ yk = Cx k 


(10.3) 
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where Xk = x(4) at some sequence of time instants For equidistant time 

instants tk = kh there is a transformation 


<f> = e Fh 


l 



(10.4) 


Model structures of the type found in Eq. (10.1) or Eq. (10.3) are often tacitly 
assumed in control systems analysis, although such models are indeed the 
result of modeling and identification. 


Some heuristic model reduction methods 

Popular methods of model order reduction are polynomial truncation, the 
method of dominating poles, and pole-zero cancellations, which are all ap¬ 
plicable to linear systems only. Consider, for example, a linear system with 
the transfer function 


H(z) = 


0.22 z" 1 0.22z" 1 

1 - 0.7z -1 - 0.08a- 2 " (1-0.8z- 1 )(l + O.lz" 1 ) 


(10.5) 


There is obviously a considerable difference in the two time constants. It is 
therefore reasonable to search for methods that reduce the model order with¬ 
out seriously affecting the input-output behavior. A natural but poor method 
of model reduction is simply to truncate the numerator and denominator poly¬ 
nomials (polynomial truncation) with a subsequent compensation of the static 
gain (i.e., at z = 1) 

»W * »>(*) = ! < 10 - 6 > 

It is easy to verify by impulse response and step response that Hi is a poor 
approximation of H(z); see Fig. 10.1. This approximation is poor also if the 
truncation is supported by compensation to maintain the same static gain. 
The reason is that the pole-zero location is very sensitive to the higher-order 
coefficients. (Similar problems appear in a realization of Eq. (10.5) by meth¬ 
ods of limited parameter accuracy.) 

It often makes more sense to keep the dominating poles and to eliminate fast 
modes while preserving the static gain (see Fig. 10.1). If this method is 
applied to Eq. (10.5) we have 


H(z) 


H 2 (z) = 


0.2z _1 
1 - 0.8Z" 1 


(10.7) 
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Figure 10.1 The upper graph shows the step responses of Eq. (10.6) and its reduced 
order models. The truncated pole polynomial model reduction exhibits the gross 
error. The lower graph shows the sum of squares of the model reduction error of the 
pole cancellation (<dotted line), dominating pole method (<dotted and dashed line), the 
moment matching (dashed line), and the balanced realization (solid line). 


Pole-zero cancellation is often applied to transfer functions in the following 
manner 


H 3 {z) 


0.78 z 


-2 


1.1 z- 


( 10 . 8 ) 


1 -O.lz- 1 -0.08z~ 2 1 + O.lz- 1 

where a numerator factor (z - 0.78) has been cancelled with a denominator 
factor (z — 0.8) with a compensation to maintain the static gain. 


0.2 BALANCED REALIZATION AND MODEL REDUCTION 


As shown in Fig. 10.1 it is, however, easy to demonstrate serious shortcom¬ 
ings of the presented heuristic methods according to criteria of preserved static 
gain, step responses, impulse responses, or least-squares fitting. It is therefore 
desirable to derive methods for model approximation based on some sensitiv¬ 
ity analysis of the input-output properties. An interesting approach in this 
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context is the balanced realizations and model approximation methods based 
on this methodology. 

Consider the state-space equation 


(x k+ i =<Px k + ru k , x k eR n 
\y k = Cx k 


(10.9) 


A relevant question is to develop a quantitative measure on the observability 
and controllability of the states which can be reformulated by means of the 
following question: What states can be reached with a given input energy 
assuming that xo = 0? Consider for this reason the case of a finite input 
energy J(u) where 

N 

J(u) = e uu = ^2,u\uk < 1 (10.10) 

*=i 

Direct calculation of the state xn from an input sequence {u*} £r 0 x via Eq. 
(10.9) gives 


xn 


k = l 

^o w - 1 r o w - 2 r ... r 


) 


( “0 ) 


UN-1 J 


= i/nUn 


( 10 . 11 ) 


where igN and Un are defined from Eq. (10.11). As there are infinitely many 
control sequences Un that result in the state xn for N > n, it is suitable to 
choose the one with smallest 2-norm. The suitable control sequence { Uk) 
that results in the desired state xn and which is obtained by means of the 
pseudo-inverse of i/n is 


U N 



- ^n(VnWn) 1 XN 


v _ i J 


( 10 . 12 ) 


A control sequence chosen according to Eq. (10.12) yields a restriction of the 
input effort so that 


N -1 

1 >/(u*) = Y. U l U * = U N U N= XkPnXN 
k=l 


(10.13) 
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N -1 

Pn = ysVn = ' 52 <& k rr T {<s> T ) k > o (10.14) 

k=o 

As Pn > 0 is positive semidefinite, it can be concluded that Eq. (10.13) 
provides a quadratic form with a bound on the reachable states at time N 
and it is obvious that 

N 

X>* = UnU n < 1 => x]^P^x N <: 1, \/N (10.15) 

*=o 


It is obvious from Eq. (10.14) that the reachability Gramian Pn satisfies the 
recursive equation 

P N+ i = 4>Pn® T + rr r (10.16) 

The solution Pn for a stable matrix <E> approaches the solution to the Lyapunov 
equation 

<DP<& r - P + rr T = 0 (10.17) 

and the asymptotic reachability Gramian P = lim//^ Pn satisfies the Lya¬ 
punov equation (10.17). 

The observability Gramian 

Again consider the state-space equation (10.9) and the following question: 
What state energy is necessary for Uk = 0 in order to obtain a specified output 
energy? A derivation analogous to that of the reachability Gramian gives 


1 = = x T (0)Qx(0) (10.18) 

k=0 

where the observability Gramian Q is defined as the infinite sum 


Q = ^(<D r )*C r C<D* (10.19) 

k=o 

and satisfies the Lyapunov equation 


<t> r QO - Q + C T C = 0 


(10.20) 
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Balanced realization and model reduction 

It is obvious that the reachability and observability Gramians P and Q define 
matrices that describe the sensitivities of the input-output map in different 
directions of state space (and independent of direct terms). Consider a state- 
space transformation z k = Tx k and its state-space equations 


S: ( 2 k+1 = 0>z * + r>u k = TQT~ l z k + TTu k 

l yk = C'z k = CT~ l z k 

The Gramians for the system in Eq. (10.21) are 


( 10 . 21 ) 


P z = TPT t 

Q z = T~ t QT~ 1 (10.22) 

Different state-space realizations may thus result in different Gramians, and 
it can be questioned whether there is any transformation T such that P 2 = Q,. 
In fact, this property is achieved by choosing a state-space representation z 
with equal and diagonal controllability and observability Gramians where 

Pz = Qz = 2 = diag(<7 1 ,a 2 ,...), with cq = s/I~{PQ) (10.23) 

where A. t (PQ) denotes the ith eigenvalue of the matrix PQ and 2 is a diagonal 
matrix with elements <x,. One algorithm uses the Cholesky factors Q, U Z, 

o the matrices P, Q, I as intermediate results and determines the state-space 
transformation matrix T 


Q - QTQi 
QiPQf = UX 2 U T 
U T U = I 

T = Z ~ 1 1 U t Q 1 


(10.24) 


The resulting state-space realization is interesting because it has similar (“bal¬ 
anced ) properties of reachability and observability, and the magnitude of the 
e ementsff, of the Gramian Z expresses the relative importance of each state 
\Zk)i tor the input-output behavior. 

Consider the diagonal matrix Z where large values o, represent essential 
ates Zi whereas small represent states Zj that are less important for the 
npu ou put e avior. It is essentially straightforward to suggest elimination 
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of rows j and columns j of a state-space realization with a small element Oj of 
the Gramian. Let O' = TOT” 1 and r' = TT denote the transformed system 
matrices of Eq. (10.21) and let the state vector z* = Txk be decomposed as 


Zk = 



(10.25) 


where z° k is the vector of components that are suggested to be eliminated. The 
state-space equation in Eq. (10.21) is then 


PH + 

v z k +1 ' 

yk = C'z k - (Cx Co) (*{) 


Oil 

Ooi 



(10.26) 


If we neglect the dynamics of z° k by assuming that z° k has no dynamics inde¬ 
pendent of z\ and u* and therefore eliminate z° k = (/ - Ooo) _1 (Ooi2* + Ton*) 
from Eq. (10.26) we obtain the reduced order model dynamics 

2 Li = (On + Oio(/ - OooHOqiH + ( r i + Oio(7 - Ooo) _1 r 0 )«* ^ ^ 

yk = (Ci + Co(/ — Ooo) -1 Ooi)zi + Cq(I — Ooo) 


A model reduction guided by the magnitude of the singular values in E is 
a balanced model reduction. We illustrate the procedure with the following 
example. 

Example 10.1—Model reduction from balanced realization 
Consider the transfer function 


0.22z -1 

H ^ 1 - 0.7z _1 - 0.08z” 2 


(10.28) 


with the controllable canonical realization 

( 0.7 0.08'! ( l'l 

x.(k + 1 ) = [ 1 0 J‘ W+ [oJ" 

y(k) = ^ 0.22 0 J x(k) 

A balanced realization is 
x{k + 1) 


(k) 


(10.29) 


( 0.7869 

0.1079 

j x(k) + 

' 0.4579 'I 

1 0.1079 

-0.0869 

k -0.1018 j 

( 0.4579 

-0.1018 

) x{k) 



(10.30) 
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with 


Z = 


0.5510 

0 


0 1 

0.0169 J 


and the state-space transformation matrix 


0.4579 0.02881 

-0.1018 0.1295 J 


(10.31) 


(10.32) 


The elements of the diagonalized Gramian X are of different magnitudes, 
which indicates that a first-order model would be sufficient. Elimination of 
the second state vector component of Eq. (10.30) according to Eq. (10.27) 
results in balanced model reduction with the first-order model 


x(k + 1) = 0.7976*(&) + 0.4478u(£) 
y(k) = 0.4478x(&) + 0.0095u(fc) 


(10.33) 


■ 


‘Balanced model reduction for continuous-time systems 

Assuming that the balanced state-space representation of a given continuous¬ 
time system is 

d_ r*n 

dt ( x 0 J 

y 

one can approach model reduction by approximating 


■( 


An 

Aoi 


Aio 

Aoo 


( Ci c o («) * Du 


(10.34) 


xq = 0, and xq = —Aq^AoiXi —AqqBqu (10.35) 


The reduced-order state-space model is then 


•*i = (An — AioAqJ Aoi)xi + ( Bi —AioAq^Bo)u 
y = (Ci- CoAj^Aoi)*! + (D - C 0 A^B 0 )u 


(10.36) 


which contains a direct term from u to y also if the full-order direct term is 
zero. This reduction principle is natural in many applications as it eliminates 
the dynamics of xq while it preserves the essential low-frequency properties. 
This principle is at least natural in applications where it is important to have 
a good fit between the full-order model and the reduced-order model in the 
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low-frequency range, 
reduced-order model 


or more generally X 2 


Another choice is to put xq ~ 0, which results in the 


(10.37) 


xi = AuXi + Bill 

y = C\Xi + Du 
■■ axz for some constant a which results in the model 


ii = (An + -Aio(a/ - Aoo) ^oi)*! + (-Bi + Aio(aI - Aoo) 1 Bo)u no 38) 
y = (Ci + Co(ccI — Aqq)~ 1 A2i)xi + (D + Cq{cxI — Aqo) 1 Bq)u 


If one denotes the full-order transfer function G(s) and the reduced-order 
transfer function as G re d(s), it follows 


x 2 - 0, 
x 2 = 0, 
x 2 = ax 2 , 


Gredi 0) =G(0) 

Gredio o) = G(oo) (10.39) 

Gredia) = G(a) 


from which it can be concluded that these model-reduction principles appear 
to have good fit at different points in the complex frequency domain. 


Example 10.2—Rohrs’ system 

Consider the following third-order linear system 


G(s) = 


229 


s + 1 s 2 + 30s + 229 


(10.40) 


A balanced realization, i.e., a state-space model of this transfer function, is 



r —0.6683 -1.6355 -0.6166 ' 


' 1.2136 ' 

X = 

1.6355 -8.2111 -7.4027 

X + 

-1.3380 


, -0.6166 7.4027 -22.1206, 


k 0.5634 , 

^ 1 

[ 1.2136 1.3380 0.5634 j x 




(10.41) 


with Gramian and its singular values 


v = 


r 1.1018 
0 

. 0 


0 

0.1090 

0 


0 ' 
0 

0.0072 , 


and the transformation matrix T between x and z = Tx. 


T = 


' 1.2136 
-1.3380 
. 0.5634 


345.4191 ' 
26.8176 
5.1795 , 


(10.42) 


38.6514 

-32.6780 

-5.6501 


(10.43) 
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--x--y 

igure 10.2 Step responses, impulse responses, and transfer function from Rohrs’ 
system and from reduced-order models. The step responses of the balanced model 

rl/w /° n hnes) Vlrtua,1 y indistinguishable from the original response 

(solid line) Responses of the heuristically reduced first-order model 2/(s + 1) are 
shown by the dotted line. ’ 


The reduced second-order model is 

x = { _0 - 6512 -1-8419 'j ( 1.1979 1 

1 1.8419 -10.6884 J + l -1.5266 J “ 

y=[ 1-1979 1.5266 J * + 0.0144u 

with the transfer function 


n , , -u.eyoos + 20.5561 
2<S) = s ^ T l 1.3395s + 10.3523 + 0 0144 
The reduced first-order model is 


(10.44) 


(10.45) 


x = -0.9686* + 1.4610u 

y = 1.4610a: - 0.2037u (10.46) 

Notice that the step responses and the impulse responses in Fig. 10.2 are 

well preserved through the model reduction as compared to heuristic model 
reduction to G x (s) = 2/(s + 1). 
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Figure 10.3 Expansion of a transfer function Ri~i(s) into the coefficients ca-i, 
C 2 i and a residual transfer function block R>(s) as practiced in continued fraction 
approximation. 


10.3 CONTINUED FRACTION APPROXIMATION 


Consider an asymptotically stable system with a transfer function G(s ) and 
develop the high-order transfer function 


G(s) = 


B(s) 

M*) 


(10.47) 


Cl + 


- +«!(«) 
s 


Cl + 


C2 


C3 + 


C 4 


+ 


by expanding Gq(s) into the coefficients ci, C 2 ,..., C 2 m* and a residual transfer 
function block R m (s ); see Fig. 10.3and Fig. 10.4. A model reduction method 
can be proposed by means of the approximation R m (s) = 0, which truncates 
the seauence of coefficients after 2m. This results in an ,*7zth reduced order 
model with the approximating transfer function 

G„( S ) - (10-48) 

with coefficients obtained from calculations reminiscent of those for the Routh 
criterion of stability. Consider the numerator polynomial and denominator 
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Figure 10.4 Interpretation of the continued fraction approximation as a recursive 
closed-loop system with subsystems G;. 


polynomials of the full-order transfer function G(s) expressed as 

A(s } - do + cis + a 2 S 2 -f — = P ( ’ 2 ) (s) 

B(s) = 6 q + 6 is + 62 s 2 + ••• = P (- 1 ) (s) 


(10.49) 


with polynomial coefficients entering the Routh array 


ao #2 

bo b\ 62 

P ( x 0) pf ••• 
p? pf* p ? 1 ... 

r Cl ' 

C2 

C3 


r a 0 /b 0 ' 

bo/pf 
P^/P { o l) 

(10.50) 

pf P ( 1° P2° ••• 

Cj +2 

< * > 


r - 

0 C; 

1 ' 

\_ 



where the /eth polynomial coefficient at iteration i is determined by the 
order recursive equation 


(i-i) 

(i) _ (/— 2 ) _ (i- 2 ) P*+i 
Pk ~ Pk +1 P 0 (i-1)’ 

P 0 


/A = 0,1,2,... 
U = 0.1,2,... 


(10.51) 
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The coefficients {c,} of the continued fraction expansion are found as succes¬ 
sive ratios of the elements of the left-most column of the Routh array. 


Example 10.3—Continued fraction approximation 

Consider again Rohrs’ transfer function 

= s 3 + 31s 2 + 259s + 229 = 1 + 1.131s + 0.1354s 2 + 0.0044s 3 ( 10 - 52 ) 

The Routh array of the coefficients of the denominator polynomial (first row) 
and the numerator polynomial (second row) is presented below where the 
coefficients c,- appear as the ratios of successive elements of the first column. 


1.0000 

1.1310 

0.1354 

2.0000 

0.0 

0.0 

1.1310 

0.1354 

0.0044 

-0.2394 

-0.0078 


0.0986 

0.0044 


0.0029 




0.0044 

0.0 ci = 0.5000 

c 2 = 1.7683 
=> c 3 = -4.7236 
c 4 = -2.4272 
c 5 = 34.029 


(10.53) 


A first-order continued fraction approximation of G(s) is then obtained as 

1 1.7683 2.0000 


G(s)«Gi(s) = 


s 

Cl + — 

C2 


s + 0.8842 1 + 1.1310s 


A second-order approximation G2 (s) of G(s) is obtained as 

(C2 + C 4 )s + C2C3C4_ 


G 2 (s) = 


S 2 + (C1C2 + ClC 4 + C3C 4 )s + C1C2C3C4 

-0.6588s + 20.2744 2.000 - 0.0650s 


(10.54) 


(10.55) 


s 2 + 11.1357s + 10.1372 1 + 1.0985s + 0.0986s 2 

which reproduces the static gain of the full-order model (see Fig. 10.5). ■ 


Interpretation of continued fraction approximation 

The continued fraction approximation is based on the expansion 


B(s) 
A(s) 


1 



(10.56) 


r*> 4 - 
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Figure 10,5 Step responses and transfer function magnitude of Rohrs 9 system 
(10.24) (solid line ) and reduced-order models (dashed lines) by means of continued 
fraction approximation. The dotted line shows the result from the heuristic model 
reduction G(s ) = 2/(s + 1). 


which may be interpreted graphically according to Fig. 10.3 and Fig. 10.4. 
The model reduction presupposes that the innermost transfer function block 
R m may be eliminated, i.e., f? m (s) « 0. Truncation of the continued fraction 
approximation after two and four coefficients provides the first- and second- 
order approximations, respectively. A condition for a good approximation is 
that the feedforward gain c 2m /s is much larger than the gain of the omitted 
transfer function block R m (s). 

The continued fraction expansion may also be applied to discrete-time systems 
for which the order recursion and the forward shift properties interact in an 
interesting manner. Consider the input and output variables which exhibit 
the dependencies 

Yi(z) = c 2i z~ l U ul (z) + Y i+l (z) 

U i+l (z) = -cv-Mz) + Ui{z) ' ‘ 

with a mixed backward/forward dependence in the order recursion; Fig. 10.6. 
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Figure 10.6 Interpretation of the continued fraction expansion as a recursive trans¬ 
fer function relationship in a forward (right) or a forward/backward manner (left). 


A pure forward-order recursive equation is obtained as 


( Y i+ i(z) 1 _ ( 1 + Cii-lCiiZ- 1 —C 2 iZ~ 1 "I f Yi(z) 'j 

[l7 /+ l(z)J l -C 2/-1 1 J (Ui(z)} 

where the transfer matrix is unimodular, i.e., 

det [ 1 + C2 '- lC2 ‘' Z ' 1 ‘ C f ) = 1 
V —C2i-1 1 ) 

so that 

fi^zn = f 1 caiz- 1 1 f^z)! 

[ Ui(z) J U 2 /-I 1 + C 2 f-lC 2 iZ- 1 J { U M (Z) J 


(10.58) 


(10.59) 


(10.60) 


Some aspects of these nice algebraic properties are exploited in lattice algo¬ 
rithms', see Chapter 11. 


10.4 MOMENT MATCHING 

Consider a transfer function given as the (infinite) series 


H(z~ 1 ) = h 0 + hiz 1 + h 2 z 2 + ... (10.61) 


Matching of the reduced-order model B m /A m 

1 B m (z _1 ) _ bp + b\z~ l + • • • + b m z m 

m{Z ’ ~ A m (z- 1 ) ~ 1 + aiz- 1 + • • • + a m z" m 


(10.62) 
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can be made so that the moments 

OO 

M* = k = 1.2,3,... (10.63) 

n = 0 

match the original transfer function up to the 2/nth moment. Based on the 
observation that it is possible to formulate M* as a weighted sum of the trans¬ 
fer function H and derivatives of H up to order k and evaluated at z = 1, it 
has been proposed to use the following “moment-matching” procedure 

M 0 

Mi 

M 2 


H\ z =i 
dH . 

5F* |z=1 

d 2 H, dH 


(10.64) 


The resulting moment-matched and reduced-order rational transfer function 
can be viewed as an impulse response matching of H m to H. 

Example 10.4—Moment matching 
Consider the discrete time transfer function 


H(z~ l ) 


0.22z" 1 

1 - 0.7 z- 1 - 0.08 a- 2 


0 + 0.22Z- 1 + 0.1540z" 2 


that should match the reduced-order model 


+ 0.1254z -3 + • • • 
(10.65) 


1 + OiZ -1 

Moment matching by means of differentiation of H m gives 

bo + bi 


H m \z=l 

dH m | 
dz~ x 


= 1 


|z=l — 


1 + &1 
— (llby i hi 
(1 + « i) 2 


= 4.9091 


- 2 M^l) . 39.1074 


dz~ 2 |i=1 “ (1 + d) 3 

Much work eventually gives the reduced-order transfer function 

0.0149 + 0.1858Z" 1 


Hiiz- 1 ) = 


1 - 0.7993Z- 1 


( 10 . 66 ) 


(10.67) 


( 10 . 68 ) 
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10.5 THE PADE APPROXIMATION 

The transfer function can be expanded to the polynomial series 

G(s) = go + gis + g 2 s 2 + ... (10.69) 

Assume that we truncate the polynomial series G(s) after 2m terms and de¬ 
note this truncated polynomial 


G m (s) = go+glS + ... + g2m-lS 2m ~ l (10.70) 

A way to match a rational function B m /A m to the truncated Taylor expansion 
G m is to solve for the polynomial coefficients of the equation 

B m (s) = G m (s)A m (s) (10.71) 


The resulting reduced-order transfer function B m (s)/A m (s) is known as the 
Pade approximation of the transfer function G(s). We now give an example. 

Example 10.5—Pade approximation applied to model reduction 
Consider the transfer function 


rw = IM = 458 

0K ’ U(s) s 3 + 31 s 2 + 259s + 229 


(10.72) 


with the Taylor series expansion 


G 0 (s) = 2.000 - 2.2620s + 2.2876s 2 - 2.2898s 3 + ... (10.73) 


By matching the first few coefficients in the polynomial equation 

Bm(s) = Gm (s)A m (s) 

for first- and second-order approximations of the type 

Bi(s) _ bp 23 2 (s) _ bp + bis 

Ai(s) 1 + ais* &n A 2 (s) l + ais+a 2 s 2 

it is straightforward to fit a first-order approximation 


(10.74) 


(10.75) 


Bi(s) 2.0000 
^ Ai(s) " 1 + 1.1310s 


bo = go 

0 = goai + gi = 0 


(10.76) 
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and a second-order approximation 


bo = go 

bi - goai + gi - 62 ( 3 ) _ 2 - 0.0645s 

0 = go&2 + gi a i + g2 ^ 2 ( 5 ) 1 + 1.0987s + 0.09805“ 

0 = gia 2 + gzai + g3 


It is for two reasons necessary to state a warning against the Pade approxi¬ 
mation. First, a serious problem is that an approximation of a stable transfer 
function may yield an unstable transfer function. A second problem is that the 
impulse response is poorly matched, which can be inferred from the following 
example. 

Example 10.6—Unstable Pade approximations 
Consider the transfer function 


B 2 (s) _ 2s + 1 _ 

A = s 2 + s + 1 = 


£0 + gis + ■ ■ ■ 


(10.78) 


A first-order transfer function approximation Bi(s)/Ai(s) = &o/(l + cfis) can 
be obtained by application of Eq. (10.77) 


60 = (go + + <■>») => {£ 

which suggests the reduced-order model 

Bi{s) = 1 

Ai(s) 1 - s 


(10.79) 


(10.80) 


which is unstable and, of course, a very poor approximation of the original 
transfer function. ■ 


10.6 DESCRIBING FUNCTION ANALYSIS 

The Laplace and Fourier transforms are powerful methods for application on 
linear systems, but few such methods are applicable to nonlinear systems. 
There are, however, attempts to extend the frequency domain analysis to 
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r=0 


Nonlinear 

element 

u=n(x,dx/dt) 


G 0 (iw) 


-1 


Figure 10.7 Describing function analysis 


nonlinear systems, and one such method, describing function analysis (or har¬ 
monic linearization), is based on harmonic analysis. The method starts with 
an assumed periodic solution sufficiently close to a sinusoidal oscillation on 
the form 

y(t) = C sin (cot) (10.81) 

for a system with a nonlinear block, see Fig. 10.7. Static and dynamic blocks 
can be described by some function 

u = n(x,x) (10.82) 


It is also assumed that the input to the nonlinear element is close to a nonlin¬ 
ear oscillation although there are no precise quantitative criteria to establish 
the validity of this approximation. 

If the forcing input x is a periodic function then the output too is a periodic 
function of time, and the output may thus be developed in a Fourier series 
expansion. For a periodic function f(t) = f{t + T) (for all t and a period T) 
it holds that the Fourier series expansion is 


^ oc oo 

f(x) = -a 0 + ^ a* cos (^ x ) + X) bk sin(kx) 

k=i *=i 


or 


with the coefficients 


or in polar coordinates 


f(x) = c*e f( * x+ *‘ ) 

k = -OO 

2 r T/2 

a k = — f(t)cos(kt)dt 

1 J-T/2 

2 r T/2 

hk=r f(t) sin(kt)dt 

1 J-T/2 

c k = \fik + K 

. ® k 

(Dk = arctan j— 

' ' Ok 


(10.83) 


(10.84) 


(10.85) 


( 10 . 86 ) 
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Real-valued functions thus take the form 

1 °° 

-a° + sin(£x+ <p k ) 

k=i 


(10.87) 


The analytic approach is to determine the conditions under which the expected 
oscillations occur. Consider a dynamic system with a periodic oscillation. This 
oscillation may be described with a Fourier series consisting of one fundamen¬ 
tal frequency and higher harmonics. The amplification of the fundamental 
frequency ( i.eco) is the ratio of the first Fourier coefficient c\ of the output 
of the nonlinearity and its input amplitude C 


N(C) = Cl ^ c l(gl) = — - 


( 10 . 88 ) 


where 


i r T ' 2 

ai = — / n(C sin cot, Cco cos at) cos cotdt 

1 JT/2 

1 r T/2 

bi - — J n(C sin Q)t,Ca> cos cot) sin cotdt 
1 JT /2 


(10.89) 


The describing function N(C) is obtained as the amplitude dependent gain 
|JV(C)| and its phase shift <pi(co ) for the nonlinear element. It is important 
to note that the describing function is related only to the nonlinearity n(x, x) 
and not to the linear part of the system. 


Harmonics caused by nonlinearities are ignored in describing function analy¬ 
sis and a balance for the fundamental Fourier component defines an equation 
for amplitude and frequency for a possible sustained oscillation. 


Characteristic equation 

Assume that the transfer function included in the control object is 


Us) 


B(s) 

A(s) 


10.9C) 


A differential equation related to the behavior of the closed-loop system in the 
absence of external inputs is 


P = 


d_ 

dt 


A(p)x(t) + B(p)n(x,x ) = 0, 


(10.91) 
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Figure 10.8 A rate-limited servo, 
with a linear approximation on the form 

(1 + G 0 (p)N(C))x(t) = 0 


(10.92) 


The characteristic equation has an approximate solution for <o and C satisfy¬ 
ing the equation 


G 0 (ico) = - 


N(C) 


(10.93) 


This equation can be solved graphically by including the graph of -1 /NIC) in 
a Nyquist diagram. The crossing between the Nyquist curve Go(ico) and the 
describing function —1/N(C) indicates the possible existence of a limit cycle 
with amplitude C and frequency coq (see Fig. 10.9 ). 

The methods of describing function analysis have been successful for analysis 
of limit cycles and sustained nonlinear oscillations. (The term “limit cycle” 
derives from phase plane analysis where the limit cycle describes an isolated 
path corresponding to the periodic solution.) 

Example 10.7—A rate-limited servo 

Consider the rate-limited servo in Fig. 10.8. The describing function for the 
saturating amplifier in Fig. 10.8 is 


N(C) = / f ( ar csin § + §^1 - §?), C > S 

U. c <s 


(10.94) 


where S is the saturation limit of the amplifier. The uescubing function is 
depicted in Fig. 10.9 on the form — 1 /N(C), which reaches its maximum value 
-1 for small values of C, i.e., in the linear range of N(C). 

Notice that the whole describing function has the phase — n radians. The 
transfer function Go(ico) between u and y has a phase delay of —n radians 
for co = 1.414 and the corresponding gain is |Go(i * 1.414)| = 1/6. From this 
information we expect the Nyquist contour and the describing function to cross 
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for if > 6 for which we expect a periodic solution with period 2 rr /1 414 = 4 44 
values " 16 SImuIations are shown in K S- 10 . 10 . which support the calculated 
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The analysis is made under the assumption that the input to the nonlinearity 
is a sinusoid. The amplitude characteristic of the linear element has to be of 
low-pass nature so that it introduces a significant attenuation at the frequen¬ 
cies of higher harmonics, and the first condition below implies that the linear 
part indeed has low pass filter properties. The conditions may be stated as 
the following list: 

i. 

\Go(ikco)\ «c|G 0 (i6>)|; k = 2,3,... 

\G(){ik(o)\ — * 0; k — > oo 

it. Gq(s) must not have any imaginary poles s = ±ik. 

iii. The function n(x,x) should have finite partial derivatives with respect 
to x and x and must not be an explicit function of time. 

A fourth condition applies to describing function analysis as presented here. 

iv. The zero-order coefficient c 0 = 0 of n(x,x). 

The fourth condition excludes statically unbalanced systems and systems with 
rectifying properties. Another restriction is that the method is formulated 
without regarding the interference of external inputs or disturbances. 

It is worth mentioning that the describing function analysis belongs to the 
class of methods which assume the existence of a solution and then proceed 
to show its characteristics. If the method prerequisites are not satisfied the 
describing function analysis may predict oscillations that do not exist and may 
fail to predict periodic solutions that indeed do exist. 

It is clear from Example 10.7 that the describing function analysis determines 
the gain and phase of the transfer function at one point in the Nyquist dia¬ 
gram. This property can be exploited for a rudimentary form of frequency 
response analysis by using nonlinear elements in a feedback loop. The estab¬ 
lished limit cycle determines approximate values of phase and magnitude of 
the transfer function involved. 

Example 10.8—Pulse-width modulation (PWM) 

This type of modulation leads to asymmetric inputs and therefore needs a 
more complicated analysis with a nonzero constant term Co of the describing 
function. 

Consider a pulse-width modulated (PWM) system that satisfies conditions i- 
iii cited previously. The nonlinearity is a switching element (see Fig. 10.11). 
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Figure 10.11 Describing function analysis applied to pulse width modulation. 


Consider first a case with a nonzero modulation frequency z(t) = z\ sin (co z t) 
imposed on the slowly varying signal x. The pulse-width modulating signal z is 
often a sawtooth-shaped signal of high frequency. The pulse-width modulated 
signal is a square wave with a nonzero mean, which is high during a fraction 
of the modulating frequency. 

A describing function to analyze the behavior of u is then 


u{t) = n(v, v ) « — x(t) + niz (10.95) 

zo 

where ni is the ordinary describing function terms. If the modulating fre¬ 
quency (o z is very high compared to the transmission properties of Gq(s), then 
the high-frequency components deriving from z are effectively absorbed by the 
low pass link Go so that 

u{t) « — x(t) (10.96) 

zo 

The apparent linearization of the relay used for PWM is special for the saw¬ 
tooth form of z and other wave forms give other function characteristics. The 
method of PWM analysis is relevant for modeling of thyristor-controlled de¬ 
vices and other actuator implementations and for control systems analysis 
and design. ■ 


10.7 BALANCED MODEL REDUCTION IN IDENTIFICATION 

There are two obvious ways to use balanced model reduction in the context of 
modeling and identification. First, a linearized model may be further reduced 
in order. A second application is to reduce a linear model as obtained from 
identification. 

This choice of model order is a possible alternative to statistical procedures 
based on loss function evaluation, AIC, FPE, etc., as practiced in identification 
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Time [s] 

Figure 10.12 Graphs of input and output from a first-order dynamic system. 


and which have problems with consistent model-order estimation. Identifica¬ 
tion assisted with model reduction comprises two steps with (1) estimation of a 
high-order model (provided that the excitation is sufficient to allow high-order 
model estimation) and (2) model reduction with elimination of less important 
states as indicated by the Gramian Z obtained for the balanced realization. 

Example 10.9—Model reduction in identification 
Consider estimation of the first-order system 


yk+i = 0.9y* + 0.1 u k + d (10.97) 

with data according to Fig. 10.12. The large constant d - 1 results in a clear 
bias of the first-order estimate 


y k+ i = 0.9838y* + 0.2638«* (10.98) 


A better result (see Fig. 10.13) is obtained by estimating a tenth-order model 
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1 2 3 4 5 6 7 8 9 10 


Model order 

Figure 10.13 Loss function and estimated variance versus model order as obtained 
from identification of higher-order models. 

and reducing the estimate into a first-order model 


x* + i = 0.9466** + 0.2912u* 
y k = 0.2912** - 0.0192u* 


v/ , —0.0192z + 0.1029 x , 

y(z)= -,-0.9466 < 10 "> 


Step responses and impulse responses are shown in Fig. 10.14, and both have 
a clear bias that, however, is less prominent in magnitude. The step responses 
and impulse responses from the original system, the estimated first-order 
model Eq. (10.98), and the reduced-order first-order estimate Eq. (10.99) are 
shown in Fig. 10.14. The corresponding Bode diagrams are shown in Fig. 
10.15. 

Example 10.10—An impulse response test 

Consider the data obtained from observation of the impulse response { gi} 15 0 
in Fig. 10.16. 


g{t) = 1.2 • 0.5* - 0.2 • 0.75* 


( 10 . 100 ) 
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Figure 10.14 Step responses and impulse responses of the observed system (solid 
line), the first-order least-squares estimate (dashed line), and tenth- and reduced 
first-order models (dotted line). Notice that the least-squares estimate is poor due to 
the nonzero mean of the noise. 

It is a problem that only few data points have been collected. Organization of 
the recorded impulse response g as the state-space model 



'0 

0 

0 ' 


' 1' 
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**+l = 

0 

1 

• 

x k + 

0 
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•. 
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.0. 

y(k) = | 

[ So 

gl 

Bn-1 Bn J x k 



and application of balanced model reduction to this model permit determina¬ 
tion of the appropriate model order n = 2 and the estimate of the impulse 
response 

g(t) = 1.38 • 0.523* - 0.38 • 0.703* (10.101) 

It can be seen in Fig. 10.16 that the estimated impulse response and the 
recorded estimate are similar in magnitude and time course. However, the 
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Figure 10.15 Bode diagrams of the observed system (solid line), the first-order 
least-squares estimate (dashed line), and tenth- and reduced first-order models (dot¬ 
ted line). Notice that the least-squares estimate is poor due to the nonzero mean of 
the noise. 

parameter accuracy is not very good as can be expected from the few data and 
the incomplete impulse response. ■ 
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Model order 


Figure 10,16 Data of an impulse response test with a fitted second-order model 
[upper). Notice that the data points (‘o') and the impulse response (‘x') of a second- 
order model fit closely whereas the first-order reduced model (‘*') gives a less accurate 
fit. The lower diagram shows the singular values <T; (i = 1,2,... # 15) obtained in the 
procedure to find the balanced realization. 
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10.9 EXERCISES 


10.1 Show that the Pade approximation 


Oi(s) 


b 

s + a 


of the transfer function 


G 2 (s) = 


s + p 

(s + l)(s + or) 


( 10 . 102 ) 


(10.103) 


is unstable for certain values of or and ft. 

10.2 The following model has been obtained from estimation of an ARX-model 
from input data u and output data y. 


Y (z) = 


z - 1 

2 2 - 1.79z + 0.792 


U(z) 


(10.104) 


In consideration of model reduction the following balanced state-space 
model has been calculated 



( 0.7910 

-0.0423 1 

( 1.0001 ) 


Xk+1 = 

[ 0.0423 

0.9990 J Xk T 

[ 0.0118 J Uk 

(10,105) 

yk = 

( 1.0001 

-0.01198 ) x k 
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The eigenvalues of the Gramians are 


f oh 1 _ f 2.6837 'j 
{ cr 2 J " ( 2.4035 J 

The transformation matrix is 

r 0.7363 22.3504 1 

l -0.2638 22.3622 J 


(10.106) 


(10.107) 


Is it advisable to reduce the model? If so, determine the reduced-order 
model. Otherwise, explain why the model reduction is not possible. ■ 

10.3 Show that there is a transformation x = Tic so that the system 


** + i = Ox* + fu* 
yk = Cx k 


(10.108) 


has equal controllability and observability Gramians which are diagonal 

P = Q = diaglcrj,..., cr„) (10.109) 

10.4 Make a geometric interpretation of balanced model reduction in Eq. 
(10.36) as a projection. 

10.5 Consider a controllable state-space model 

x = Ax + Bu, x e R n (10.110) 

Let P be the solution of the Lyapunov equation 

AP + PA t = -BB t (10.111) 

and let T be the Cholesky factor of P _1 so that P~ x = T T T. Show that 
the components of 2 = Tx for u(t ) = S(t) ( i.e the impulse responses) 
are orthogonal in the sense that 

TOO poo 

/ z{t)z T {t)dt= / Te At BB T e Ar ‘T T dt = I nxn (10.112) 

Jo Jo 

How can this property be used in order to produce orthogonal regression 
variables for identification purposes? 




11.1 INTRODUCTION 

Real-time application of identification algorithms is interesting for various 
purposes such as supervision, tracking of time-varying parameters for adap¬ 
tive control, filtering, prediction, signal processing, detection, diagnosis, and 
artificial neural networks. However, most identification methods based on a 
set of measurements are not suitable for real-time application. It is there¬ 
fore desirable to make a suitable reformulation of the algorithms in order to 
provide efficient procedures. 
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The recursive identification algorithms are estimators of the type 


0k = &k-i + Pk<Pk£k 


£k 

Pk 


= yk -<pl6k-i 


= Pk -1 - 


i+ <t>iPk-i<pk' 


P 0 given 


( 11 . 1 ) 


with the parameter estimate 0 k, the regressor <j>k, the prediction error and 
the matrix Pk, which are all evaluated at time k = 1,2,3,.... 

There are several attractive features of algorithms with an organization sim¬ 
ilar to Eq. (11.1). It is obviously suitable for real-time applications and only 
few data need to be stored. It is thus an organization which is attractive 
also as a computational organization of off-line algorithms. In particular, it 
provides a method for identification of systems with time-varying parameters. 

There are also certain drawbacks such as the fact that the model structure 
is determined a priori and the fact that iterative solutions based on larger 
data sets may be difficult to organize. Thus, it is of some interest to consider 
the desirable modifications for real-time application of algorithms originally 
stated as off-line methods. The following example shows one such simple 
derivation. 

Example 11.1—Recursive estimate of a constant 
Consider the following noisy observation of a constant parameter 


y k = 0 + v k , t E{v k )=Q, < E{v i v J } = a 2 Sij (11-2) 


which is on linear regression form yk = <j>k& + Vk with <pk = 1 for all k. A 
least-squares estimate is found as the sample average 



(11.3) 


In order to avoid the summation at every instant of time it is natural to include 
previously made summations in some state which is updated when new data 
arrive. A feasible choice of recursive state equation for Eq. (11.3) is 


0k = 6k-i + ~ &k-i) 


(11.4) 
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According to Eq. (5.23) one can estimate the variance of the least-squares 
estimate as 

Pk = ~' E {$~ e ){0 ~ e ) T ) ( 1L5 ) 

1=1 

The parameter variance estimate in Eq. (11.5) can be expressed as a state 
vector with updating in each recursion according to 

Pk 1 = Pk -1 + ^2 ( 1L6 ) 


or 


P 2 Pk- 
O 2 +Pk -1 


where it can be noticed that p* -> 0 as k -> oo. 


(11.7) 


Derivation of recursive least-squares identification 

Recursive least-squares identification according to Eq. (11.1) can be derived 
from the ordinary least-squares estimate according to the following derivation. 


Consider as usual the regressor <p t and the observations y, collected in the 
matrices 


( 


n 


T \ 


(y i 


O* = 


and Jk = 



\yk ) 


( 11 . 8 ) 


The least squares criterion based on k samples is 


V0k) = \(Jk - VkOkfm - 0*0*) = \e{e k ) T £{9 k ) (11.9) 

The ordinary least-squares estimate is 

k k 

e\ = (<&[<&, = (53^r) _1 E^y«) (n.io) 

i=l i= 1 


Introduce the matrix 

Pk = = (QkOk)- 1 

t=i 


(li.ii) 
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A recursive updating is given by 

Pi 1 =P- k \+Ml 

k-i 

= p kC£2<Piyi + <Pkyk) = P k (Pk-i&k-i + <Pk>'k) = 0k-i + Pk<Pk{yk - <t>Je k - 1 ) 

i=x 

( 11 . 12 ) 

It is also feasible to calculate the matrix P k instead of its inverse. Notice that 

p k = = (Ql-iQk-1 + <t>k<t>l)~ l = (^*ii + Mk)' 1 (11.13) 

In real-time operation it is usually difficult and computationally expensive to 
do matrix inversion. It is thus preferable to avoid such operation by means of 
a suitable reformulation of the problem. Application of the matrix inversion 
relation (see Appendix A) 

(A + BC)- 1 = A- 1 - A~ X B{I + CA -1 J5) _1 CA -1 (11.14) 

to the expression in Eq. (11.13) is straightforward and we obtain 

p k = ( p i-i + Mir 1 = Pk- 1 - PkMki! + tlPkMkrWPk-i (11.15) 

By collecting these formulae one can verify that the recursive least-squares 
identification in Eq. (11.1) will result in the same parameter estimate 6 as 
the least-squares method provided that the initial value Pq of the recursive 
equation for P k is chosen appropriately, i.e., so that P^ 1 = 0. This requirement 
can be easily satisfied in algorithms for updating of Pj ^ 1 but is impossible for 
updating procedures similar to Eq. (11.15). A heuristic approach is to choose 
Po as 3n identity matrix multiplied by some large number. However, such 
an approach does not actually solve the problem of initial estimates and may 
also cause large initial transients m the parameter estimates. A systematic 
solution to this problem is therefore to make an initial estimate by means of 
ordinary least-squares identification, which then provides initial values both 
for 0 O and P 0 . 


11.2 RECURSIVE LEAST-SQUARES IDENTIFICATION 


The recursive identification algorithm is 
= Ok -1 + Pk<Pk£k 
£ k - y k — <Pl6k-i 

PkMk^lPk-i 


P k 


■ yn 

P k - 1 - 


i + <pTPk~i<j>k ’ 


Po given 


( 11 . 16 ) 
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Time [s] 


Figure 11.1 A first order system y* +1 = ay* + buk + Wk with input u and output y 
identified by recursive least-squares estimation of a and b . The correct parameter 
values are a = 0.9 and b = 0.1 and are indicated by dotted lines. 

where 6k is the parameter estimate, e* is the prediction error. The matrix Pk 
constitutes, except for a factor cr 2 , an estimate of the parameter covariance at 
recursion number k. 

Example 11.2—Recursive least-squares estimation 

Consider recursive identification by means of the recursive formulae in Eq. 

(11.1) when applied to data generated by the system 

S : yk+i = o,y k + bu k + w k+1 , where {“ ~ ^® (11-17) 

where {u>*} is a white-noise sequence. Some graphs showing input-output 
data and parameter convergence are found in Fig. 11.1. ■ 

Some properties of recursive least-squares estimation 

A natural question in the context of recursive identification is how to evaluate 
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parameter accuracy and convergence in the course of recursions. The following 
convergence analysis provides some relevant partial results. 

Consider the following quadratic function of the parameter error 

Q($k) = ^{e k - efPi l (0k - o) = ifljp * 1 &k (11.18) 

This function develops in each recursion according to 


«(«*> - Q(o k -i) = lelP; l e k - ^P&e^ 

= - + 0k-i<t>k£k + ^4>lPk<Pk£ 2 k 

= \@k-i<Pk + e k ) 2 + i(-l + <plP k <!) k )el 




= £(*l-i** + **) 2 - * 


i <p"kPk-i<i>k 


21 + </>lPk-\<!>k 


E 


2 

A 


(11.19) 


Under the linear model assumption y k = <f>k e + v k so that e k = -O^k + o* 
one can conclude that 


Q(*a) - Q( 5 *-i) 


I» 2 _ 1 r 2 

2 * 21 + <j>^Pk-i<pk £k 


( 11 . 20 ) 


Clearly, in the noise-free case with v k = 0 for all k it holds that Q(6 k ) decreases 
in each recursion step. Moreover, if Q(6 k ) tends to zero it is implied that ||0*|| 2 
tends to zero as the sequence of weighting matrices {P^ 1 } is an increasing 
sequence of positive definite matrices where P* 1 > P^fj for all k > 0. In 
such a case parameter convergence follows. It should, however, be borne in 
mind that it is difficult to verify a priori conditions under which Q(6 k ) tends 
to zero. This is further complicated in the case of nonzero disturbances for 
which we give the following result that is valid both for ordinary least-squares 
identification and for recursive least-squares identification. 

Theorem 11.1 

The errors of estimated parameters and the prediction error for least-squares 
estimation have a bound determined by the noise magnitude according to 

m) + <?(«,) = («,)£(»,) + , l v T v 

(The proof is called for in Exercise 11.1.) 


( 11 . 21 ) 
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Theorem 11.1 thus states that a certain weighted sum of squared predic¬ 
tion errors and the squared parameter errors equal the sum of squared noise 
components. As the sequence of weighting matrices {P* 1 } is an increasing 
sequence, one finds for recursive least-squares identification by means of the 
recursion formula for P^ 1 that 

||0*||| o = \v T v - \e T {e k )e{d k ) - ±6l<t> T k <t> k 6 k (11.22) 

Under conditions of a stationary stochastic process {i>*} with 

Ojo* > kclpxp, c constant (11.23) 

one can, then, conclude parameter convergence. However, conditions become 
more complicated in the case when Eq. (11.23) is not valid, which may appear, 
for instance, in cases of a poor parameterization or for inputs of insufficient 
excitation. Poor convergence properties have been demonstrated in cases of 
large disturbances and a rank-deficient matrix. 

Hence, several important algorithm properties can be derived from the behav¬ 
ior of the Pk —matrix. For instance, the matrix P* is a positive definite and 
symmetric matrix (Pk = Pj > 0) such that Pk —> 0 as k —> oo. The matrix Pk 
is asymptotically proportional to the parameter estimate covariance provided 
that a correct model structure has been used. It is for this reason it is often 
called the "covariance matrix.’’ 

The result obtained from recursive least-squares estimation is the same as 
that of ordinary least-squares identification if the initial values Po and 6q 
can be chosen to be compatible with the results of the ordinary least-squares 
method. Another approach is to make an initial least-squares estimate by 
solving the normal equations for some block of initial data. Recursive identi¬ 
fication then provides a procedure for updating the parameter estimates. 

Modification for time-varying parameters 

The recursive least-squares estimation in Eq. (11.16) gives equal weighting 
to old data and new data. It is. however, natural to pay less attention to old 
data in many applications where time-varying parameters should be tracked. 
This can be achieved by introducing the "forgetting factor” X and the modified 
performance criterion of least-squares estimation 

k 

j(e k ) = 

i=i 


0 < A < 1 


(11.24) 
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Ok = 0 k -1 + Pk<t>k£k 


e k = y k — 



Pk-itkfiPk-i . 
A + <Pl p k-i<t>k 


( 11 . 25 ) 


This algorithm emphasizes the fitting of recent data and reduces the influence 
of old data. There are, however, some undesirable secondary effects of the 
algorithm in Eq. (11.25) with problems with a noise sensitivity that becomes 
more prominent as A decreases. Another problem is that the P*-matrix may 
increase as k grows if the input is such that the magnitude of Pk-i<pk is small 
(“P—matrix explosion” or “covariance matrix explosion”). 

The choice of the forgetting factor A is determined by a trade-off between the 
required ability to track a time-varying parameter (i.e., a small value of A) and 
the noise sensititivity allowed. A low value of A results in a system with a good 
ability to track time-varying parameters but may also give variations in the 
estimated parameters caused by disturbances. A value of A close to 1 is less 
sensitive to disturbances but compromises the ability to track rapid variations 
in the parameters. A default choice of A is in the range 0.97 < A < 0.995 
although the appropriate choice depends, of course, both of the characteristics 
of the identified process and the sampling frequency (see Fig. 11.2). The 
number of data points kept in “memory” can roughly be calculated as 


1 

1 - A 

which corresponds to the time constant associated with A. 


( 11 . 26 ) 


Example 11.3—Choice of forgetting factor 

Consider once again the process of Example 11.1 and assume that the param¬ 
eter 6 changes abruptly from 6 = 1 to the new value 8 = 2 at some unknown 
time. The effect of the forgetting factors A = 0.99,0.98,0.95 on the tracking 
of the time-varying parameter can be seen in Fig. 11.2. A lower value of the 
forgetting factor gives a more rapid response in tracking the new parameter 
value. A value of A closer to 1 means that a longer data record is kept so 
that old data have a considerable impact on the parameter estimate. How¬ 
ever, small values of the forgetting factor A result in a parameter estimate 
that becomes susceptible to noise. The forgetting factor therefore needs to be 
chosen as a trade-off between noise sensitivity and the parameter tracking 
capabilities. I 
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Time [s] Time [s] 

Figure 11*2 Demonstration of the influence of the choice of forgetting factor. The 
number of data points in memory according to Eq. (11.26) is indicated (< dashed line) 
for various forgetting factors. 

Alternative approaches in maintaining the least-squares up-to-date consist of 
keeping a block of recent data and to discard old data according to some crite¬ 
rion of age. Such methods appear to be applications of ordinary identfication 
methods to which one can refer to other chapters of this book. 

Another important issue is how to improve numerical properties for small 
sampling intervals relative to the time constants of the process. One im¬ 
portant approach is to reformulate the recursive algorithms in terms of the 
5—operator 

8 = Z , or z = l + h8 (11.27) 

h 

with the state-space system representation 

f x k+ i = 4>x k + ru k [Sx k = <S>'x k + T'u k = £(0 - I) x k + u k 

1 yk = Cx k \y = Cx k 

(11.28) 

This reformulation makes the state-space realization and the corresponding 
system identification less error-prone due to favorable numerical scaling prop- 
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erties of the O'— and r'—matrices as compared to the ordinary z-transform 
based algebra; see Middleton and Goodwin (1990) for details. 

A Kalman filter interpretation 

Assume that the time-varying system parameter 6 may be described by the 
state-space equation 


0 k+1 = 0 k + v k , 'E{v i ] = 0, < E[v i vJ] = RiSij, ViJ 
>’k = <PlO k + e k , £{e,} = 0, < E[e i eJ} = R 2 S ijt ViJ 


(11.29) 


where {y*} is interpreted as a sequence of scalar indirect observations of 6k 
obtained from the observations y*. The Kalman filter for estimation of $k 
from observations of y k (see Appendix D) is 


6k = Ok -1 + Kk£k 

Kk = Pk-i<Pk/(R-2 + <!>lPk-i<t>k) 


£k = yk-n^k -1 

Pk-i<t>k<PlPk-i 


Pk = Pk-i 


R2 + <plPk-i<f>k 


+ Ri 


(11.30) 


which corresponds to Eq. (11.25) except for the term added to P k and 
i ?2 added to the denominator expression of the recursive equation for the 
estimated covariance matrix Pk. An important difference is, however, that 
the term R 1 added to Pk causes a change in the dynamics of Pk from an 
exponential growth rate to a linear growth rate for <p k = 0. In addition, notice 
that Pk of the Kalman filter does not approach zero as k —» 00 for a nonzero 
sequence {0*}. 


11.3 RECURSIVE INSTRUMENTAL VARIABLE METHODS 

The ordinary instrumental variable solution 

e\ = (z T <p)~ 1 z T y = (£*i*Tr l (kziyi) 

i= 1 i=l 


(11.31) 
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with the instrumental variables z* collected in the matrix Z may be reformu¬ 
lated as the recursive equation 


e k 

K k 

£k 

Pk 


dk-i + Kk^k 
Pk-iZ k f(l + <l> T k Pk-iZk) 
yk -<t>k0k -1 

_ Pk-i z k<Pk p k-i 

* _1 1 + tfPk-lZk 


(11.32) 


where the “instrument” vectors Zk replace the regression vectors <t>k• A stan¬ 
dard choice of Zk for identification of ARX models is 

Zk = [ ~**-l ... - x k-n A U k -l ... U k -n B J (11.33) 

for some variables Xk uncorrelated with the noise affecting the investigated 
system. The variables Xk may be, for instance, the estimated output. 

The recursive instrumental variable method has indeed some stability prob¬ 
lems associated with the choice of the instrumental variables and the updating 
of the Pk -matrix. 


11.4 PSEUDOLINEAR REGRESSION 

Pseudolinear regression is also called recursive maximum likelihood estima¬ 
tion or extended least-squares method. The regression model is 


= <p\e + v k 



= (ctl ... 

a n A 

bi ... b n „ ci . 

equation is 



6k 

= &k- 

i + K k e k 

K k 

= Pk- 

i0a/(1 + </>kPk-i<t>k) 

£k 

= yk ■ 

- <Pl&k-i 

Pk 

= P k - 

Pk-i<Pk4>k p k-i , 
1 + (p'k Pk-l^k 




(11.34) 


(11.35) 
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figure 11.3 A first-order system y*+i = ay* + 6a* + a;* +1 + cwk with input a and 
output y identified by recursive least-squares estimation of a and 6 {upper right). 
Notice the bias due to colored noise. Pseudolinear regression {lower left) helps to 
reduce the systematic error by means of estimation of a, 6, and c. The correct 
parameter values are a = 0.9, 6 = 0.1, and c = 0.7 are indicated by dotted lines. 

The regression vector that would be natural to use for linear regression is 

(-y*-l ... yk-n A U k -i ... U k -n B Vk-1 ... V k -n c j (11.36) 

which, of course, can not be implemented as the disturbance components vk 
are not available to measurement. The regression vector vector has there¬ 
fore been modified to 

<t>k= yk-n A ~*-l ... £k-l ... £k-n c J (11.37) 

The name pseudolinear regression derives from the fact that disturbance com¬ 
ponents of the desired regression vector have been replaced by estimated vari¬ 
ables. The method is also called recursive maximum likelihood method as it 
tries to use the disturbance components as regressor elements and approxi¬ 
mates these unknown elements by the calculated prediction errors {ek }. This 
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method may also be modified to include a forgetting factor for identification of 
time-varying systems. The algorithm may also be modified to iterate for the 
best possible £ within each recursion step. 

Example 11.4 —Pseudolinear regression 

Consider recursive identification of parameters of the system 

S : yk+ 1 = a yk + bu.k + Wk+i + cwk (11.38) 

where {u>*} is a zero-mean white-noise sequence. A comparison between the 
results for pseudolinear regression and recursive least-squares identification 
is shown in Fig. 11.3. * 

Pseudolinear regression is an approximate linear regression metnod as the un¬ 
known regression elements u* are replaced by the residual £k- Such a method 
is known to work well for ARMAX models with “moderate” noise correlations 
i.e., ARMAX models whose C-polynomials are close to 1. This vague descrip¬ 
tion can be substantiated by showing that parameter convergence can be ex¬ 
pected under stationary conditions for C— polynomials satisfying the condition 

Re c^) lz=exp(i<y) - ~ n < * - * (1L3S) 

for pseudolinear regression based on recursive least-squares identification. 


11.5 STOCHASTIC GRADIENT METHODS 

This method is sometimes called stochastic approximation, or least mean square 
(LMS) as advocated by Widrow and the field of digital signal processing. The 
method is called “steepest descent” as a numerical method. This type of al¬ 
gorithm has been reinvented by several authors. A basic property is that it 
contains no state to represent collected data, and the estimated parameters 
constitute a full state-space of the algorithm. Typically, the stochastic gradient 
methods take on the form 

Ok = 0 k -i + Yk<t>k£k (11.40) 

where Yk is a sequence of step lengths satisfying the properties 

DO OO 

Y^ 7 k = oo, and ^yf<oo, 

A =0 *=0 


and Yk > 0, \fk 


(11.41) 
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Fl 5 Ure /o 1 T 1 ^ A ! e , raged COnvergence trajectories for recursive least-squares identifi- 
cafaon (KLS) ana least mean-square identification (LMS) for 100 realizations of the 
data from the system m Example 11.1. The correct parameter values are indicated by 
+ (upper graphs) and by dotted lines (lower graphs). The RLS identification appears 
to nave taster and more accurate convergence. 

A modification to include a time-varying regressor-dependent gain is 


Ok = 0 k _ i + y k e k 
£ k = yk - 4>l0k -1 

Yk = Q<Pk/r k , Q = Q T > 0 (1L42) 

r k = rk-i + <PkQ~ 1 <l>k 

where Q is some positive definite weighting matrix. Some advantages of this 
method are that rapid computations are possible as there is no P k -matrix to 
evaluate. There is thus little influence on the parameter estimate from past 
data, v hich, of course, is often good for detection of time-varying parameters. 
However, there are several drawbacks such as slow convergence and noise 
sensitivity as compared to the least-squares-type estimation methods, see Fig 
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Time-varying parameters 

The stochastic gradient methods can be applied to time-varying systems with¬ 
out modification. A minor modification of Eq. (11.42) is to include a forgetting 
factor X so that 

r* = Xrk-i + @k Q l 0ki 0 < X < 1 (11.43) 

This algorithm tends to keep the factor r k at a lower magnitude, which affects 
the above gain Yk- 


11.6 THE LEVINSON-DURBIN ALGORITHM 


Many identification algorithms involve some evaluation of a set of output 
variables without any known corresponding input variables. Such outputs 
are often assumed to be filtered white noise, and it is of interest to provide 
good algorithms for analysis of such data. The Levinson-Durbin algorithm 
provides an efficient recursive solution to the Yule-Walker equations by using 
the Toeplitz structure of the correlation matrix in the equation 


r Cyy(O) Cyy(l) ••• Cyy(n) 


^ l ' 


r a 2 

w w 

C„(l) Cyy( 0) Cyy(n~ 1) 


ai 

= 

0 

,Cyy(n) Cyy(Tl-l) ••• C y,{0) > 


k d n J 


. 0 . 


and determines the coefficients of the AR-process of order n. The algorithm 
proceeds recursively in model order and in time k to compute the parameter 


sets 


[ a™, <7?), ( af\ a' 2 '. <rf).(a'" 1 , a<*>,.... a*,”, ol) 


(11.45) 


where a superscript has been added to the AR-coefficients to denote the model 
order and where the final set is the desired solution. Expressed in the linear 
regression terminology we introduce the variables 


4 m) = [ yk y*-i ••• yk-m+i ) 
■a< m >= (a*™) 4 m) ... a ( m m) ) T 

a (m) = ( a[ m) 4 m) ... «L m) ) T 
4 m> = yk-y k =y k -(<t>i m .[) T a {m) 
b[ m) = y k - m -yk-m =y k -(<l>i m) ) T a {m) 


(11.46) 
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Figure 11.5 A forward/backward prediction error filter according to the Levinson- 
Durbin algorithm. 


where k denotes time index and m model order. The algorithm can be derived 
by means of an alternative parametrization known as the lattice structure 
according to the relationships 


«<*> = 

r a (m ~V ' 

. 0 . 

* k'I" 


Qr( m ) = 

1 0 

! a (m- 1) ^ 

* Kl" 

( ) 


which are sometimes called Levinson recursions. The coefficients denoted by 
{Kf , Kf ,..., Kj- . and { K^\ K^\..., K^) are known as reflection coeffi¬ 
cients or partial correlation coefficients and exhibit a simple relationship to the 
autoregressive coefficients {a^ \ a$> \ .. . > a.^}. To visualize the lattice struc¬ 
ture we denote the prediction error for an nth order linear predictor of y^ at 
time k as 


» 


n-i 


J=l 


n -1 


= ^ + + K { f n) y k .. n 

j= 1 

= 4"' 11 t 

where Eq. (11.47) has been used and where we have introduced 


(11.48) 


6 *° = yk-n + Y^ a j n) y^-j = y*-* - (n.49) 

7 = 1 

The term b^ ^ is called the backward prediction error, i.e., the error when one 
attempts to “predict” y^-n on the basis of the sample yk-n+i> ■ ■ - ,yk (see Fig. 
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Figure 11.6 Lattice formulation of prediction error filter according to the Levinson- 
-Durbin algorithm. The autoregressive coefficients are reparametrized as reflection 
coefficients or partial correlation coefficients { K*p) ? =1 and {*) " =r 


11.5). By an argument similar to Eq. (11.48) we summarize the recursive 
equations for model order m and time k as 

forward prediction error ^ ^ 

b^ = b^ l) + backward prediction error 


where = jk* The terminology of forward prediction error and 

backward prediction error derives from the interpretation of the two variables 
e^ m) and of model order m and for time k, as the prediction errors y* -y* m; 

and yk-m ~ 54-m» respectively. The prediction error equations in Eq. (11.50) 
have an order recursive structure according to Fig. 11.6 with the transfer 
function relationships 


(EW{z)\ _ r 1 K^z-n (E^(z) 1 

U (m) (z)J " K [™) 2 -i I 

\ b ' 

where the algorithm is initialized by 



(11.51) 


Minimization of £{ |e^ m) | 2 ) and £{ |6jf°| 2 } with respect to the reflection coeffi¬ 
cients ' and ' gives the optimal lattice-stiuctiire parameters 


K ( f m) = 
= 


■r-wc^'n 

2ii4" _I, i 2 i 


m = 1,2 


(11.52) 
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which follows from the mathematical expectation of the square of Eq. (11. 
or according to the orthogonality principle. Assuming that the reflectio: 
efficients are chosen according to Eq. (11.52), one can evaluate the folio* 
residual covariances 


pif = = - ( C yy (l) C yy ( 2) ... C yy (m + 1) ) ( 

and 


=p«r> = EUe^i 2 } = P ir 1] + KfVbT'K 

p^ = £{i6i m) i 2 } = p[ m b - 1] + Kjrvz- 1 ', 


= C yy ( 0) 

Pbb = Cyy(0) 


(11 


This algorithm is initialized by 


= P { 2 = P ( tb = c yy(°) 

P l £ = -Cyy( 1) (11 

= <*? = K™ = K< 1} = -C yy (l)/C yy (0) 


As the recursive equations for p ^ and p ^ have the same initial values 
take on the same value for all m, it suffices to evaluate <r^ only. Deter 
nation of the autoregressive parameters and lattice parameters can, thus 
simplified to the recursive equations 


PS 0 = - ( ^(1) Cyy( 2) 


Y ("0 




K (m) = K (m) 


a (m) = fl (m-1) + a W a l»i-l)' 
a (m) = a j«-1) + 


V+l 

= (1 - |aL m) | 2 )a 2 


C yy (m + 1) J ( ^ 

(m-1) / 9 

~Pbe l G m-\ 

j = m - 1 , - 2 ,..., 2,1 

j = 2,3,_m — 1. m 


(XL 


m-l 


The autoregressive coefficients are, thus, found recursively from 
data, and the transfer function of the full filter between input j \ 
uals is 


E n (z) 

Y(z) 


= A(z~ l ) = 1 + a^z- 1 + • 


+ al r “z~ 


correlai. 


ana r^s 


( 11 . 


which is the inverse of an AR-process. Hence, one can expect that {e 
restores this input if the filter input is obtained from an nth order AR-proc 
with a white-noise zero-mean input. 
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As a final remark it should be said that there are several reasons to imple¬ 
ment AR-modeling by use of lattice structure calculations as these algorithms 
are known to combine rapid computation with good numerical properties. In 
addition, the lattice structure has an interesting relationship to the contin¬ 
ued fraction approximation (see Chapter 10), which justifies interpretations 
in terms of model reduction. Moreover, the model structure is valuable in 
attempts to identify processes without measurable input variables where it 
bridges the gap between correlation analysis and state-space realizations. 
This property makes the lattice model popular and useful for many physi¬ 
cal modeling applications— e.g ., speech processing and for inverse scattering 
models in physics. 


11.7 SPECTRAL PROPERTIES 

Recursive identification methods are very much time-domain oriented and 
only few connections to frequency-domain methods thus appear. One problem 
is that the organization of most recursive estimation methods correspond to 
spectrum estimation methods using rectangular or exponential windows with 
poor spectral leakage properties. Implementation of sliding windows requires 
significant block data processing, and spectral estimators of the periodogram 
or correlogram type are often unattractive for recursive implementation. An¬ 
other problem is that the extraction of spectral information from ARMAX-type 
models is not trivial. It has been observed for processes consisting of a sinu¬ 
soid in noise that the peak location in the autoregressive spectral estimate 
depends critically on the phase of the sinusoid. Also, the spectral estimate 
sometimes exhibits two closely spaced peaks falsely indicating a second sinu¬ 
soid. This phenomenon is known as spectral line splitting . 

It is for this reason that recursive implementation of spectral estimator often 
takes the form of block data processing or lattice algorithms. 


11.8 BIBLIOGRAPHY AND REFERENCES 

The development of recursive identification methods has to a large extent 
evolved to solve the requirements of implementation, real-time application 
and adaptive systems. Detailed books that treat recursive identification meth¬ 
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Further references are given in Chapter 15, which treats adaptive systems. 


11.9 EXERCISES 

11.1 Show that for a linear regression model % = A> k 0 + v and for Q(§k) 
according to Eq. (11.18) and for V{6 k ) = £* =1 ej(e k ) £i (e k )/2, it holds 
for the least-squares solution 6 k that 

v 0k) + Q\6 k ) = ho T v (11.57) 

Hint: Use the property £ T( $>k = 0. 

11*2 Adapt the proof of parameter convergence by means of the function Q(0 k ) 
as defined by Eq. (11.18) for the case with a forgetting factor X < 1. 

11.3 It is clear that the P-matrix in Eq. (11.13) except for a constant factor 
cr asymptotically represents the parameter covariance in the case of 
constant parameters and an uncorrelated noise sequence. How should 
the recursive least-squares algorithm be modified in order to estimate 
the noise covariance cr 2 ? 

11.4 Shov/ that the function 


V(0k) = \e T k Q-'9 k (11.58) 

decreases in each step of the recursive identification algorithm in Eq. 
(a 1.42). Use this result to prove parameter convergence. B 




12,1 INTRODUCTION 

Accurate knowledge of a continuous-time transfer function is a prerequisite of 
many methods in physical modeling and control system design. As we have 
seen in the earlier chapters, system identification is often done by applying 
time-series analysis to discrete-time transfer function models. A problem with 
such approaches, however, is that there exists no undisputed algorithm for pa¬ 
rameter translation from discrete-time parameters to a continuous-time de- 
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scription. Problems in this context are associated with translation of the sys¬ 
tem zeros from the discrete-time model to the continuous-time model, whereas 
the system poles are mapped by means of complex exponentials. As a result, a 
poor translation tends to affect both the frequency response such as the Bode 
diagram and transient responses such as the impulse response. One source 
of error in many existing algorithms is that computation of the system zeros 
is affected by discrepancy between the assumed and the actual intersample 
behavior of the control variables. 

Another systematic problem associated with the approach of ARMAX-model 
based parameter estimation is the following simple observation. Assume that 
a parameter of an ARMAX-type model changes abruptly to a new value. De¬ 
tection of such a change and convergence of the parameter estimate to the 
new value would require a time proportional both to the sampling period and 
to the number of estimated parameters. This delay may be unacceptably 
Not only is it sometimes impossible to improve the response time but a 
shorter sampling period may be incompatible with good parameter identifiabi- 
lity. Moreover, similar problems arise in adaptive control applications which 
often are associated with recursive estimation methods. One obvious disad¬ 
vantage of many discrete-time adaptive control schemes is that the sampling 
period must be chosen to provide good identifiability rather than good control 
action. 

This chapter deals with the problem of estimating the transfer function of 
a continuous-time dynamic system in the presence of colored noise. We in¬ 
troduce an operator transformation that allows continuous-time parametriza- 
tion, whereas the parameter estimation can be made by means of a discrete¬ 
time maximum-likelihood algorithm or a recursive algorithm. A comparison is 
made between the performance of the continuous-time identification method 
in comparison with a standard identification of an ARMAX model. The method 
is useful in cases where it is important to not only estimate the coefficients of 
a continuous-time transfer function but also to maintain a physical interpre¬ 
tation of the transfer function results. 


12.2 OUTLINE OF THE METHOD 

There is one approach to the identification of continuous-time systems that 
developed in the 1950s for use on analog computers. The idea is to have “state 
variable filters” Fj,..., F n acting on the inputs and outputs of the continuous- 
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Model misfit 

Figure 12.1 Filter arrangement for continuous-time identification and 

filtered variables y\ ,..., y n and u\ t ...,u n . 


time process; see Fig. 12.1. One possible alternative is to choose 

Fi 




1 + ST ’ ’ " ' 1 + ST ' 

Let all yi, Ui ; 0 < i < n be outputs from the filter assembly 


y «(0 = 

Ui(t) = Fj{u(01 


0 < i < n 


( 12 . 1 ) 


(12 2 ) 


These filter outputs will then give approximative 
outputs. If the original input-output model is 


d^y 

dt n 


+ CL\~ 


d n 1 y 


+ • • ■ + O-ny 


derivatives of the inputs and 

n ~ l u 

+ ■ ■ ■ + b n u 


S: 


dt n ~ l 


dt n ~ l 


(12.3) 
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then we can fit the parameters a i, ..., a nj p i..., p n of the linear model 

^ ■ y*+<ziyn-i + -- + a a y = p l u n - l +---+p n u (12.4) 
by parameter adjustment until y n and y n agree 

yn = -&iy n -i - a n y + fiiu n -1 + • * • + p n u (12.5) 

and the model misfit e = y n - y n -> 0. For appropriate choices of the state 
variable filters it holds that 

<*i~ah Pi ~ bn 1 < i < n ( 12 . 6 ) 

although the filtered signals are only approximatively equal to the true state 
variables. Methods of this type have therefore been developed as a branch of 
instrumental variable methods; see Young (1969). 


12.3 MODEL TRANSFORMATION 

In this section we introduce a modification of this algorithm that is based 
on an algebraic reformulation of transfer function models and, in addition, 
we introduce discrete-time noise models. The idea is to find a causal, stable, 
realizable linear operator that may replace the differential operator without 
approximations. This must be done in a manner ensuring that we obtain a 
linear model for estimation of the original transfer function parameters a*, 6*. 
Here we shall be considering cases where we obtain a linear model in low 
pass filter operators. It will be shown that there is always a linear one-to-one 
transformation which relates the continuous-time parameters and the con¬ 
vergence points for each choice of filter. We then follow investigations on the 
state-space properties of the introduced filters and the original model. The 
convergence rate of the parameter estimates is then considered. Finally, two 
examples are given, one with applications to time invariant systems, the other 
applicable to time varying systems. 

Consider a linear nth order transfer operator formulated with a differential 
operator p = d/dt and unknown coefficients a £ -, 6/. 

G 0 (p) = — 1 P n ~ 1+ -" + b « = 

p n + aip n ~ l + • • • + a n A{p) 


(12.7) 
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A and B are assumed to be co-prime. Nov/ introduce the operator 

a 1 


A = 


t = 1/a > 0 


p + a 1 + pr ’ 

This allows us to make the following transformation 


G 0 {p) 


= = ?M - W>. 


A(p) A*(A) 


W 


( 12 . 8 ) 


(12.9) 


with 

A*(A) = 1 + ociX + oc 2 X + ... + ccnX n 
B\X) = piX+p 2 X 2 + ...+fi»X n 
An input-output model is easily formulated as 

A*(A)y(0 = B'(X)u{t) 


( 12 . 10 ) 


( 12 . 11 ) 


y(t) = -a x {Xy]{t) - ... - a n [X n y](t ) + pi[Xu][t) + ... + /? n {^ n «] W (12.12) 

This is now a linear model of a dynamical system at all points of time. Notice 
that [Xu], [Xy], etc., denote filtered inputs and outputs. The parameters at. Pi 
may now be estimated with any method suitable for estimating parameters 
of a linear model. A reformulation of the model in Eq. (12.12) to a linear 
regression form is 

y(t) = tf(t)d r (12.13) 

where 

Or = ^ -OTi -CC2 ... ~a n Pi ... Pn ] (12.14) 

and 

T 

<Pr{t)= ( [*>](*). [A 2 y](t), .... [Xu]{t), ... [A»u](0) (12.15) 

We now have the following continuous-time input-output relations 
y(t) = G 0 (p)u(t) = G' 0 (X)u(t) 

y(t) = (pT{t)0r (12-16) 

Y(s) = <t>r(s)0 T where O r (s) = £{?>r(0}( s ) 

where L{ •) denotes a Laplace-transform. Finally, a Laplace transformation of 
Eq. (12.16) gives 


Y(s) = G' 0 (X(s))U(s) 


(12.17) 
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A particularly attractive feature is the fact that the same linear relation holds 
not only in both the time domain and the frequency domain, but also without 
any approximation or selection of data. 

Example 12.1—Estimation of two constant parameters 
Consider the system with input u, output y, and the transfer operator G 0 

y(t) = Goip)u(t) = — u{t) (12.18) 

p + a i 

which is expressed by means of the differential operator p. Using the operator 
transformation A of Eq. (12.8) one obtains 


A = 


1 

1 + pr 


This gives the transformed model 


G'oW 


6 x tA _ /9 X A 

1 + (giT — 1)A 1 + cfjA 


A linear estimation model of the type (12.13) is given by 


( 12 . 8 ) 


(12.19) 


y{t) = -a x [Xy]{t) + Pi[ku](t) = (p?(t)0 T (t) 

(12.20) 

with 


1 { [Au](0 J 

(12.21) 

and the parameter vector 



(12.22) 


The original parameters are found via the relationships 


f ai ) 

( + 1) > 

Ur J 

1 


and the corresponding parameter estimates from 


f Sl ] = f 5l + ' 
1 b J = [ & , 


(12.23) 


(12.24) 
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Time [s] 



Time [s] 



0 50 0 50 


Time [s] Time [s] 

Figure 12.2 Input u and output y of the process y = — ax+ bu with a = 2 and 6=1 
(upper left). Recursively estimated parameters are shown for t = 0.3, h = 0.3 ( upper 
right), t = 0.3, h = 0.03 (lower left), f = 3.0, h = 0.3 (lower right), respectively. 
Both the sampling rate h and the operator time constant r affect the convergence 
properties. 


Sampling of all variables in Ea. (12.20) and application of the recursive least- 
squares estimation algorithm 


6 T (k) = 6 T (k- l)+P(k)q> T (k)e(k) 
e(k) = y(k)-<pJ(k)0 T (k- 1) 


P{k) = P(k - 1) - 


P(k-l)q> t (k)q>?{k)P(k-l) 
1 + (pT(k)P{k - 1 )<Pz(k) 


(12.25) 


is obviously possible. The vector d x {k) includes the parameter estimates, e(k) 
is the prediction error, and P{k) is the estimate cf the covariance at recursion 
number k. Simulation results for different choices of the filter time constant 
z s and the sampling interval h s are based on the input-output data of Fig. 
12.2. All simulations have started with initial values at zero for the parameter 
estimates and the filters. The simulations have been performed with oi = 2 
and bi = 1 and a moderate excitation by means of a square-wave input. The 
simulations in Fig. 12.2 indicate that the convergence works satisfactorily 
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Figure 12.3 A system containing a spring action k, a damping action d t and a mass 
m. The variable f(t) denotes the external forces on the system and x(t) the resulting 
position of the mass. 

over a large range of values of x. The estimates are accurate for all the 
cases of simulation in Fig. 12.2, and recursive estimation performed with the 
sampling interval h = 0.03 s appears to have no limiting effect on the conver¬ 
gence rate. The convergence rate is faster for a shorter time constant t, but 
if x is too short the convergence transient may be violent. 

The convergence rate with h = x - 0.3 s is still good in Fig. 12.2 with a sett¬ 
ling time of the same order of magnitude as the process time constant l/a\. 
It can be seen from Fig. 12.2 that there are acceptable convergence rates over 
a large range of values of the time constant x. Notice that the convergence 
rate is higher for small values of x, though the parameter transient tends to 
be more violent. 

Example 12.2—Estimation of a time-varying parameter 
Assume the spring coefficient k and the mass m to be well known and con¬ 
stant (see Fig. 12.3), whereas the damping coefficients is unknown and time 
varying. Assume that the external force f and the position x are measurable. 
The force f is assumed to be the control input variable. The transfer function 
from input f to output x is given by 


X(s) 
F(s) - 

The operator translation in 
from force f to position x 


b 2 


S 2 + —S + — s 2 + diS + 02 


(12.26) 


Eq. (12.8) gives the transformed transfer operator 


X(X) _ b 2 x 2 ; l 2 

F(X) (l-A) 2 + ai r(l -A)A + a 2 xU 2 

The unknown coefficient is a\ for which we find the relation 


(12.27) 


<p T (t) ai -■= y(t) 


(12.28) 
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1 


0 - 


I-,-1 i 

0 50 100 150 

Parameter al and its estimate 



Figure 12.4 Output y (upper) and input u (middle) for the second order system 
with time-varying damping. True parameter a\ and the estimate (lower) sampled 
with 0.5 s. Notice that the parameter estimate can track the true parameter for 
nonzero velocity only. 


where 

<p r (t) = r[A(l- X)x](t) (12.29) 

y(t) = -[(1 - X) 2 x}{t) - a 2 r 2 [A 2 x](t) + T 2 b 2 [A 2 f](t) (12.30) 

A simple discrete-time heuristic tracking algorithm for a\ is the following 
threshold algorithm 

( ai(kh — h ), if \(p x {kh)\ < 0.1 

ai'kh)^) ; k - 1,2,... (12.31) 

{y{kh)/(p x {kh), if|^ r (^)| > 0.1 

where the subthreshold updating is chosen in order to avoid division by small 
numbers. Some simulation studies are presented in Fig. 12.4. As it is pos¬ 
sible to utilize the estimated parameter $i for real-time modifications of the 
controller, the potential for adaptive control is obvious. ■ 
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Figure 12.5 A system composed of two components represented by the states xi 
and * 2 - 

Example 12.3—Systems with interacting subsystems 
Consider the composite system of Fig. 12.5 consisting of several subsystems 
which can be viewed as a system in closed-loop operation. Assume that the 
system equations are 


^ = Oll*l(t) + <*12*2 (t) + u(t) 

dx 2 . . 

-jf = «21*l(0 

y(t) = x 2 (t) 


(12.32) 


with states xi,x 2 and some coefficients an, 012 , 021 . An interesting question 
is how to estimate the dynamics of one subsystem based on observations from 
the interacting components xi and X 2 - What dynamics can be expected of 
subsystem if the interaction between x\ and x 2 ceases? How should the 
system and the subsystems be identified from observations u and y? The 
transfer function is 


Y(s) = 


021 


OllS - 021012 


■U(S) 


(12.33) 


Continuous-time modeling can be performed by introducing the low pass filter 
X 

1 1 1 _ 3 

X = —i- <—> s = (12.34) 

1 + sr r X 

The transformed model where X has replaced s is 


- 4(1 - X) 2 y(l) = a n {-(X - X 2 )y(t )] + a 21 a n [^y{t)] + a 2l {X 2 u{t)] (12.35) 
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Implementation of Xy, X 2 y, X 2 u and subsequent identification by, for example, 
least-squares identification, is straightforward. Notice that an, 021 , 021012 are 
identifiable provided that sufficient input excitation is available. Hence, all 
model parameters an, a-m ai 2 are identifiable. ■ 


Parameter transformations 

Before proceeding to signal processing aspects, we need to clarify the relation¬ 
ship between the parameters a,of Eq. (12.10) and the original parameters 
a,-, bi of the transfer function in Eq. (12.7). Let the vector of original param¬ 
eters be denoted by 


0 = 



—02 ... — G„ 61 



The relationship between Eq. (12.14) and Eq. (12.36) is then 


Ox — F x 0 + G r 


(12.36) 


(12.37) 


Using the definition of X in Eq. (12.8) and Eq. (12.10) it can be 
the 2n x 2n—matrix F x 


P [ Q/ixn I 

r ~ ( 0„ xn Mx I 


shown that 
(12.38) 


where 


r mu 0 

... o> 


; ; (n — i\ ; 


1 •• 

^nn > 


m U = (-!) _ 7 -j T 

(12.39) 


Furthermore, the 2n x 1-vector G r is given by 


Gx=[gi ... g n 0 ... 0) r ; gl =(")(-!)'' (12-40) 

The matrix F % is invertible when M x is invertible, i e, . for all r > 0. The 
parameter transformation is then one-to-one and 


e = F;\6x-Gx) (12.41) 

We may then conclude that the parameters a,-, 6/ of the continuous-time trans¬ 
fer function Go may be reconstructed from the parameters a,,Pi of 6 Z by 
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means of basic matrix calculations. Alternatively we may estimate the origi¬ 
nal parameters a,-, b, of 6 from the linear relationships 

y(t) = 0'<p T {t) = {F t e+ G v ) T <p T {t) (12.42) 

or 

y(t) = <pj ( t)F r e + <pj ( t)G r (12.43) 

where F r and G x are known matrices for each r. 

Hence, the parameter vectors 8 and 8 t are related via known and simple 
linear relationships so that translation between the two parameter vectors 
can be made without any problems arising. Moreover, identification can be 
made with respect to either 6 or 8 r . 


12.4 A NOISE MODEL 

Having treated the case of recursive identification, we now turn our attention 
to the general identification problem which involves estimation in the presence 
of colored noise. Consider for modeling purposes the continuous-time system 
description 

A(s)Y(s) = B(s)U(s) + C(s)W(s) (12.44) 

where A(s), B(s), C(s) are polynomials in the Laplace transform variable s 
and where it is required that C _1 (s)A(s) is a stable transfer function. 

Let h denote the sampling period used and assume that { Wk) is a zero-mean 
normally distributed white-noise sequence with covariance 

£ [w iW J\ = R5ij (12.45) 

In addition, we assume that w{t) = w{kh) for kh < t < (k + l)h, i.e., w{t) is 
assumed constant during the sampling interval. According to standard time 
series analysis it follows that the spectral density of { Wk } is constant in the 
frequency range [-n/h,n/h]. As the Fourier transform of the sampled signal 
is periodic with a period equal to the sampling frequency co s = 2n/h, the 
noise sequence {might be concluded to have constant spectral density. A 
reason for modeling noise in this way is that a continuous-time white noise 
representation is obtained without adopting the concept of continuous-time 
white noise used for analysis of Brownian motion. In principle, this noise 
model is sufficient to describe any rational spectral density for the sampled 
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noise process in the relevant frequency range [— n/h^n/h ]. Application of the 
operator calculus gives the relationship 

Y(s) = -f^a k X k (s)Y(s) + J20 k ^ k (s)U(s) + y2ra k (s)W(s) + W(s) (12.46) 
k=i k=i k=i 

which can be separated into the parameter vectors 

0 T = ... a n Pi ... Pn Yl ••• In) (12.47) 


(12.48) 


(12.49) 

After translation to the time domain and subsequent sampling and application 
of the z-transform to the corresponding discrete-time variable, we have 

Y(z) = <t>r(z)6 t + W(z) (12.50) 


and 


so that 


<Ms) = 


-A(s)Y(s) 1 

-A"(s)y(s) 

X(s)U(s) 

X n (s)U(s) 

A(s)W(s) 

l A n (s)W(s) J 


Y(s) = <Ms)0 r + W(s) 


As the noise is assumed constant during the sampling interval, it follows that 
each operator A*(s) corresponds to the discrete-time transfer function 


*k(z) 


<%(*) 

Pk{z) 


Qkjz) 

Pn(z) 


(12.51) 


where Q' k {z)/Pk(z) = Q k (z)/P n (z) is the zero-order-hold equivalent of the 
continuous-time system A*(s) and where the denominator polynomial P n (z) is 
an nth order polynomial with all roots at z = exp {-h/r ). The zero-order-hold 
equivalent of A*(s) is chosen because the disturbance iv(t) is assumed to be 
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constant during the sampling intervals. According to Eq. (12.50) we find that 
the contribution from W(z) to Y(z) is 

E(z) = C(z)W(z) = Wiz'j + f^y.^lwiz) 

k =l Fn{ < z > 

where 

C(z) = 1 + Y i?-i (z) + ■■■ + y n X n (z) 

This can be expressed in the form of an autoregressive moving-average model 

P n (z)E(z) = P n (z)W(z) + J2nQk(z)W(z) (12.54) 

*=1 


(12.52) 

(12.53) 


12.5 IDENTIFICATION 


Let as usual 6 denote an estimate of the parameter vector 6 and let 6 = 6—6, 
denote the parameter error. An objective of identification is to choose the 
optimal 6 according to some criterion. From Eq. (12.50) we suggest the 
residual model 

E(e,z) = Y(z) - <t> T (z)e (12.55) 

Let the residual sample covariance matrix be defined as 

1 * 

r n(0) = jy££i(0)e[(0) (12.56) 

k=l 


where N denotes the number of data points and where {£*(0)} f =1 is the resid¬ 
ual sequence obtained from the relationships 




O r (z)^ = — O t {z)C + W(z), z-d 


£k(0) = y k -<P T k e 


e + w k . 


domain 


time domain 


(12.57) 


A goal of system identification is to minimize the model misfit according to 
some optimization criterion such as least-squares or maximum-likelihood opti¬ 
mization. From Eq. (12.57) we conclude that {£*(0 r )} = { w k ], and maximum- 
likelihood optimization (e.g., in the case of normally distributed disturbances) 
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is approached by maximizing the log-likelihood function 


log L(0,R) = tr R n (6)R 1 — — log det R + constant 
A A 

i N fj 

= — (0)R~ 1 £k(B) - — log det R + constant 
^ *=i 


(12.58) 


where we have neglected transient effects due to initial values. By assuming 
R and 0 to be independently parametrized and the covariance matrix R to be 
unknown, we have the partial derivatives 


8 log L{0,R) <-T(n\T>- i& £ k(6) 

- m - = ^e k {0)R 

& = 1 

( aloe i (g ' B) )- <12 ' 69> 


dRij 


k=l 


= ^.(R-'RMR-'hj - y (R~ l )u 


where e,- is a zero vector except for 1 in the ith position . It is clear from Eq. 
(12.59) that a stationary point of log L with partial derivatives equal to zero 
only appears for R = R w (0). This yields the covariance estimate 


R = Rn(0) 


(12.60) 


We thus reduce the optimization problem by substitution of R = R^(6) in Eq. 
(6.34) 

(12.61) 

(12.62) 


N 

log L(0,R N (0)) = -- 7 C log det Rn(0) + constant 
z 


The maximum of Eq. (6.34) is also the minimum of 

V N (0) = log det .Rw (0) 


which has a unique minimum for 0 = 0n in the sense that 

min Vn{0) - Vn(6n) = det Rn(0n) S logdet(Rw(0Ar) i- AR) (12.63) 
0 

for any nonnegative definite matrix AR. Numerical optimization of Eq. (12.62) 
can now be approached by means of the Newton-Raphson method (see Ap¬ 
pendix C) so that at iteration order i one evaluates 

^ +1) = 8 $ -p,(v 2 y w (^ ) )- 1 vy N (^ ) ) (12.64) 
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where {/?*} is a sequence of step lengths chosen such that Pk = 1 or such that 
V (^v 1 ) * s minimized. Gradients in the scalar case (see Appendix 12.2) 
be evaluated by means of 


can 


C{z) 


d£ k( & ) \ _ 


—u 



C{Z) 


(12.65) 


C{z) J 

where we designate the low pass filtered and sampled signals as follows 
«** = [A‘u](0|<=*a, for i = 0,1,..., n and k = l,2,...,N (12.66) 

4' 1 = 

and we approximate (see Appendix 12.2) 


N 


)**(»!?) 


N 

2 


k=i 

N 


v z v„W) » ^ 

k = l 


(12.67) 


At some distance from the optimum where V 2 V,v may become poorly condi¬ 
tioned, it is standard practice to replace the Newton-Raphson procedure by 
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some quasi-Newton method — i.e., V 2 Vw in the algorithm in Eq. (12.64) is 
replaced by some positive definite matrix (see Appendix C). 


12.6 CONVERGENCE AND CONSISTENCY 

From Eq. (12.57) we conclude that e(0 x ) = for all k so that 

lim R n ( 0 x ) = ‘E{£ k (e t )el(6 x )} = = R (12.68) 

N-+oo 

It follows that e N = 8 X is a possible estimate as N increases. However, in 
order to show that there are no other minimizing elements it is necessary to 
make the following additional assumptions: 

o The identified system is appropriately and uniquely parametrized so that 
no e f her parameter vector 8\ ^ 8 X also describes the same input-output 
relationship. 

o The experimental conditions and the input signal are chosen with ap¬ 
propriate excitation properties so that V 2 V(8 X ) is nonsingular 
o The input sequence { Uk] is uncorrelated with the disturbance sequence 
{«>*}• 

If these conditions are satisfied we may conclude that 8n is a consistent esti¬ 
mate according to the proofs for consistency of maximum-likelihood and pre¬ 
diction error identification. A Taylor series expansion with two terms gives 

0 = VV n ( 8 n ) « W n { 8 x ) + V 2 V N (8 r )(8 N - 8 X ) (12.69) 

where higher-order terms can be neglected as it is known from consistency 
properties that § N -» 8 X as N -> N, The covariance of the parameter estimate 
On can thus be estimated as 

= £{(v 2 v iV (0 r ))“ 1 v^(0r)(vy w (0 r )) r (v 2 y w (e r ))- 1 } (12.70) 

which may serve as an estimate of the accuracy of 8, A covariance estimate 
can be computed as 

£* = (V 2 V w (0 w ))- 1 I o (V 2 V n (^))- 1 (12-71) 


where 


d N N 

f° = (^ £ £ (12.72) 

1 = 1 k=i 
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The estimate Gn thus obtained is asymptotically normally distributed in the 
sense that 

VN(6 n - 6 t ) ( 12 . 73 ) 

converges in distribution to 9£(0, P ) where 

P = {‘E{y k {G x )R- N l {6 r )y T k {G T )})- 1 (12.74) 

This result, which is similar to Eq. (6.93) and applies to maximum-likelihood 
methods for ARMAX models, is the Cramer-Rao lower bound (see Appendix 
12.1), and we also conclude that 6?f is asymptotically efficient. 

Example 12.4—Continuous-time and discrete-time methods 

The data in Fig. 12.6 have been generated from a system with the transfer 

function relationship 


s + 1 
s 2 + s + 2 


U(s) + 


2s + 3 
s 2 + s + 2 


W(s) 


S : Y(s) = 


(12.75) 
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Frequency [Hz] 



Figure 12.7 The impulse response, step response, and Bode diagram of the system 
(12.75). The correct value (solid line) and the estimates by means of continuous-time 
identification (dashed line) and ARMAX-model identification (dotted line). 


The simulation in Fig. 12.6 sliows time records wlien zero-mean wlnte-noise 
with variance a 2 = 0.01 is corrupting the output y with a signal-to-noise 
ratio S/N = 100. The simulations have been run with h = 0.1 s and x = 1.0 
s, and Fig. 12.6 shows the. frequency response, step response, and impulse 
response for the system and the two identified models. An operator translation 
X - l/(s + 1) thus gives 


y ; 


Y(s) = 


X(s) 


l-X(s) + 2X 2 (s) 


+ 1 ?WM o w(s> <i2 - 76) 


Application of the proposed method gives the estimated transfer function re¬ 
lationship 


‘M : 


Y{s) = 


0.9670s + 0.7696 
s 2 + 0.9386s + 1.6941 U ^ s ' + 


s 2 + 2.1154s + 1.4033 
s 2 + 0.9386s + 1.6941 


W(s) 

(12.77) 
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A comparison with standard discrete-time identification based on the N=300 

sampled values of input u and output y is interesting. Some performance 
indices are 


Method Rn(0) 

ARMAX 0.0147 
Cont.id. 0.0099 


e T e 

0.359 

0.152 


^step €impulse j|G — Go|P 

0.108 0.0227 0.0107 

0.019 0.0076 0.0036 


where e step and are evaluated for u being a unit step input and an im¬ 

pulse, respectively. Errors are evaluated over a time interval of 10 s according 
to Fig. 12.7 and 


& ~ ( ®1 il &2 j 
[10 

e step — I (y{@) — y(8)) dt step response error 

J 0 

rlO ^ 

e impulse — (y{@) ~ y{6)) dt impulse response error 

flO _ 

liG Go|j — / |jG(#/^, ico) — G(# r , ico)\^dco Bode diagram error 

- r .. , , (12.78) 

Notice that this comparison is favorable for the proposed continuous-time 

method which provides the more accurate step response, impulse response, 

and frequency response for a similar value of the loss function of optimiza¬ 
tion. 

The residual sample variance 0.0099 is close to the expected variance 0.01 
whereas the AHMAX method does not achieve this bound. Moreover, the 
residual sample variance approaches this value if the model order is allowed to 

increase. However, the qualitative difference between the responses remains 
in such a case. _ 


12.7 *STATE-SPACE TRANSFORMATIONS 

It is of great importance that no information is lost when doing the operator 
transformation. This is not obvious from the original approach with state- 
space filters where a filtered state variable could only approximate the true 
state variable due to the low pass filter properties. In this section we will 
show that there is a one-to-one mapping between the state space associated 
with the original system description and that of the transformed description. 
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Consider therefore the transfer function 


G 0 (s) = 


6lS ra 1 + •• • + bn 

s n + ais n_1 + ■■■ + a n 


(12.79) 


The controllable canonical form of Eq. (12.79) with a state vector x and the 
differential operator p may be written as 


' *l(f)' 


' ~a\ —02 ■" ■ — On-1 — On 

1 0 ••• 0 


' Xl(t) ' 


' 1' 

0 



0 10- 0 

• « . « • • 

+ • • • • 

k 0 • 0 1 0 , 


,x n (t ), 

+ 

0 

,0. 


y{t) = ( bi ... b n j x(t) 


u(t) 


(12.80) 


This may be associated with the fractional form 


jA(pm) = u(t) 
{ y(t) = B{p)^{t) 


(12.81) 


with £ as a scalar internal variable (sometimes called the partial state). The 
components x,- of the state vector x may now be related to £ via the correspon- 
dence 

x,-(0 = P n -^(t); i = 1 , 2 ,..., n (12.82) 

The representation in Eq. (12.80) is sufficient to describe the dynamics of the 
identification object, but the order of the system including both the identifica¬ 
tion object and the filters is increased by the introduced state variable filters. 
The filters will thus increase the minimal order of the system. It is possible 
to find a state space of order 2 n to describe both the process and the filters 
although the realization is nonminimal. 


( A'(p)£'(t) = u(t) 
( y(0 = B'(p)£'« 


(12.83) 


with polynomials 


A'{p) = A(p)(p + a) n = p 2n + a^p 2 " 1 + • • • + a' 2n 
B'(p) = B(p)(p + a) n = b' lP 2n - x + • • • + b' 2n 


(12.84) 
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A state-space realization is given by 


d_ 

dt 


with 


' *iW ' 

s 

< x 2n(t) ' 

< 


-®i 
1 

0 

l 0 


—a 


2n-l 


-a: 


2 n 


> 

' x[(t) 1 


'1 ' 

0 


• 

+ 

i 

j 

< x 2n(t) - 


O O 


u{t) (12.85) 




( 12 . 86 ) 


Each of the components of cp x may now be expressed as a linear combination 
of the state vector components. We have with the arguments of Eqs. (12.8), 
(12.84), (12.86). 


MW = j^-A(p)(p + «)*£'W 

= a(p 2 n ~^'(t)) + ■■■+ a n a n (p°£'(t)) = ax\(t) + ■■■ + a n a n x' 2 n (t ) 


[* n y](t) = a" f bi ... 5* 0 ... 0 ) x\t) = ^£|(—— ) n u(t) 

K > A(p) p + a' v ' 

The original state vector x of Eq. (12.82) is related to x' as follows: 

(p + «mo = m 

Xi{t) = = p n ~ l (p + a) n c'{t) 

From Eq. (12.84) and Eq. (12.88) we find 

f n 


(12.87) 

( 12 . 88 ) 


j=i 


*W = yjJ an ~ jx 'n + i-/> * = 1.2.n 


(12.89) 


Consider now the full regression vector 


PrW=(MW ••• [A n y](0 [A«](f) ... [A"u](0) r (12.90) 

The 9 ? r —vector is therefore related to the state vector x 1 by a linear transfor¬ 
mation matrix M 1 containing coefficients obtained from Eq. (12.89) and Eq. 
(12.85) 


<Px(t) = M'x'(t) 


(12.91) 
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Notice that all components of the state space are observable from [X n y] pro¬ 
vided there are no common factors of A and B. This means that the states of 
x'(t) and x{t) are observable from <p t . From the construction of Eq. (12.85) we 
also see that the state x' is controllable from u provided there are no common 
factors of A and B. Nor should there be any factor of {p + a) in B. This means 
that in principle it is possible to determine an input u such that x obtains any 
direction in the 2n -dimensional space so that no information is irreversibly 
lost in the filtering process. The following theorem can be shown. 

Theorem 12.1 

Let G be a rational function such that 


r(n\ B (P) ( P + a \n 
G(p) = AQ^ p+~a 


AW 


deg(A) = n; deg(B) = m < n - 1 

(12.92) 

where the polynomial factorization is such that B has no common factor with 
A or (p + o). Let the following strictly proper transfer operator relationship 
hold between input u and output y 


y(t) ' W) u(t) 


(12.93) 


Let X be the operator 

a 

X =- 

p + a 

Let <p t be the vector of filtered inputs and outputs 

rp 

(p r {t) = [[Xu\{t) ... [x n u){t) Mit) ••• [* B y](0)‘ ( 12 - 94 ) 

and let x' be the state vector of the controllable canonical form of G. Then 
there exists a linear transformation such that 

jc '(f) = T r <p r (t) (12.95) 


for an invertible matrix T r . 

Proof: See Appendix 12.3. 

Remark: The theorem above has shown that (pi is a sufficient state vector 
for the filter state and the system to be identified. The controllability of 
and (p t means that any direction in the 2n —dimensional space can be reached. 
Active improvement of identifiability by choice of the input u is also possible. 
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In the previous sections we have seen that the transfer operator may be exactly 
transformed to the linear model 

y(0 = <Px (t)e t (12.13) 


with 

(t)= (Xy X 2 y ... Xu X 2 u (12.15) 

Sometimes it may be desirable to make some data selection. Let us thus 
denote such a data selection by the filter f in the time domain or F in the 
frequency domain. Let subscript f denote a signal filtered by f or F . The 
estimation algorithm will then fit parameters to data from the relationships 

y f {t) = <pj(t)0 r (12.96) 

or 

Y f (s) = 0j(s)6 T (12.97) 

This means that we have the possibility of performing filtering operations in 
the time domain, or in the frequency domain, or in both. Filtering operations 
in the frequency domain will entail weightings and selections in certain fre¬ 
quency ranges. Time domain filtering will mean choices of recording times, 
averaging or sampling. 

An interesting possibility is hybrid identification which consists of sampling 
y{t) and all components of <p T (t) at certain sequence of time instants t = 
^i*^ 2 **•• The linear relationship (12.13) then still holds between y and (p x . 
These sampled data may now be used to fit parameters to the continuous 
time model in Eq. (12.13), Eq. (12.96) by using ordinary discrete-time recur¬ 
sive estimation methods. Notice that there is no discrete-time model involved, 
although we use sampled data and discrete-time estimation. It is also inter¬ 
esting that such data sampling does not need to be periodic and that “slow 
sampling” may be used if a lower convergence rate can be accepted. Notice 
also that in principle the sampling for constant parameters 6 may be per¬ 
formed without any anti-aliasing filter. This is due to the fact that 6 rather 
than y is the reconstructed entity. It would, however, be necessary to choose 
the sampling frequency properly when tracking a time-varying 0. 
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Choice of the low pass filter X 

A practical issue is to consider the choice of the time constant x - 1/a of the 
low pass filter X used in the modeling with the input-output relationship 

Y(s) = Go(s)U(s) = Gq(X(s))U(s) = <t>?(s)0 T (12.98) 

The accuracy of a parameter estimate 6 X can be evaluated, for instance, with 
a quadratic criterion 

M6r(t))= [ ( yf(r)-<pj(r)e x (t)) 2 dr (12.99) 

Jo 

It is also possible to use quadratic criteria in the frequency domain. We will 
make statements for “long” but finite time intervals [0, t] and assume that the 
Parseval relation holds between the time domain and the frequency domain. 
A counterpart to Eq. (12.99) in the the frequency domain is then 

/ +OO ^ 

1 Yf(ico) - tf(ia>)0 T (t)\ 2 da (12.100) 

oo 

By introducing the parameter error vector as 

0 T (t) = e t (t)-0 r (12.101) 

and the weighting matrix 

/ -fOO 

<P t (-ia))<t> x (ico)dco (12.102) 

OO 

with 0 r (s) = L{(p r (t)} and where T refers to the measurement duration, the 
optimization criterion Eq. (12.100) can be rewritten as 

J a = e? (^ + °° * x {-i<o)<S>*{iG>)dG^0 t = 0$P~ l {t)0 x (12.103) 

All components of are dependent on the input U(s). From Eqs. (12.13— 
12.16) it is found that the vector O r may be reduced to 

O r (s) = r T (s)U(s) (12.104) 


r,(s)= (A(s)Go(s) ... X n (s)Go(s) A(s) ... A n (s) ) (12.105) 


with 



305 


Sec. 12.8 Signal processing fillers 

In Eq. (12.104) we see that P depends on the spectrum of the input signal 
u. There is also a dependence on the unknown transfer function Go. It is 
therefore difficult to derive any result indicating how to choose r optimally on 
the basis of this type of pure quadratic criteria. 

Another approach is to demand a certain convergence rate of the parame¬ 
ter estimates which may be achieved with the following modification with a 
weighted least-squares criterion 


J[(e{t)) = f e 2ar (y f (r) - <pf (r)Q x (t)) 2 dr (12.106) 

J o 

where cc > 0 is some constant rate of desired exponential convergence. The 

weighting matrix Eq. (12.102) modifies to 

P'-'it) = J e 2ar <p f (r)<pf(r)dr (12.107) 

when evaluated in the time domain. The frequency domain counterpart of Eq 
(12.106) is 


^ 1 r0+ioo 

J 'o>{0r{t)) = \Y f (s-a)-<bJ(s-a)6 x (t)\ 2 ds (12.108) 

JO-ioo 


Bv examination of the integrand of Eq. (12.108) we find its convergence prop¬ 
erties are related to the properties of r(s) and U(s), T(s) being in turn depen¬ 
dent on A(s) and Gq(s). We then find 

/ -foo 

r T (-a — ico)U(—a — ico)U T (-a + ico)rf(-a + ico)dco (12.109) 

oo 

For a nonzero input U(s) we have the following condition for convergence of 
Eq. (12.109). 

° Go (s - a) stable 

° A(s- a) stable => r < 1/a 

This determines the limits of convergence rates for different parameter es¬ 
timations. It means that we have to require that Go is stable and responds 
rapidly enough to the input u. It is also necessary that the filter time constant 
t be smaller than the desired time constant of convergence 1/or. 
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Implementation of least-squares estimation 

Let us now consider recursive least-squares estimation of parameters of the 
linear model. A minimization of Eq. (12.99) in the continuous-time domain 

gives an algorithm of the type 

6 x {t) = P c {t)(p x {t){y(t)-(p;{t)e x {t)) 

Pc(t) = -Pc(t)<pAt)<Pl(t)Pc(t) 

The convergence rate can be evaluated by means of 

Although it is suboptimal, a discrete-time estimation is normally preferred for 
reasons of implementation. A discretization ofthe algorithm in Eq. (12.110) at 
time-instants t = 0 ,h,...,kh and a Riemann sum approximation of integration 
gives the familiar recursive least-squares identification. 

e z {t) = 6 t (t- h) + P s (t)<p f my f {t)-<pJ(t)e T (t-h)) 2.112) 

p; 1 (t) = p;Ht-h) + <p f (t)( P J(t), t = o,h,...,kh 

Manipulations of Eq. (12.112) yield a formula to update P s instead of P s 


( 12 . 110 ) 


( 12 . 111 ) 


P s (t) = P s (t — h) - 


Psjt - hWjr(^Wf{t)^Ps{t ~ h) 
1 + <p f (t) T P s (t -h)<p f (t) 


(12.113) 


Of course, more sophisticated numerical integration routines may also be uti¬ 
lized. With trapezoidal interpolation Eq. (12.113) may be replaced by 


P- S l {kh) = P- s l {kh-h) + ^((Pf{kh)(p T f {kh) + (p f {kh-h)(p}{kh-h)) (12.114) 
to obtain a better approximation of Eq. (12.110). 


12.9 CONCLUDING REMARKS 

We have formulated an identification method for continuous-time transfer 
function models and equivalent to ARMAX models for discrete-time systems. 
The continuous-time method differs from traditional approaches to ARMAX- 
model identification due to the reformulation of the disturbance model and the 
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new parametrization of the continuous-time transfer function, the parameter 
estimation method (i.e., maximum-likelihood estimation) being the same as 
that used in ARMAX-model identification. Nevertheless, as the continuous¬ 
time identification algorithm can be implemented as a discrete-time method, 
its advantages are attributable to the new parametrization. The methodology 
is of particular relevance for control systems analysis and physical modeling, 
where it is desirable to avoid discretization of system dynamics. 

A relevant question is, of course, why there is no analogue to ARMAX mod¬ 
els for continuous-time systems. One reason is that transfer function poly¬ 
nomials in the differential operator can not be immediately used for identi¬ 
fication owing to the implementation problems associated with differentiation. 
The successful ARMAX-models correspond to transfer function polynomials in 
the forward or the backward shift operators with advantages for modeling and 
signal processing, respectively, and translation between these two representa¬ 
tions is simple. In contrast, there is no commonly used operator in parameter 
estimation that corresponds to the backward shift operator of discrete-time 
systems although X is a suitable candidate for the purpose. 

Unfortunately, there are more circumstances that hamper the effective appli¬ 
cation of the large body of discrete-time identification to problems involving 
discrete-time transfer functions. First, the model representation formulated 
in the differential operator must be translated to a shift operator formulation, 
and there are several ways to do this. However, as exact parameter transla¬ 
tion typically requires some matrix exponentiations, a given continuous-time 
parameter will have a nonhnearly distributed effect on several discrete-time 
parameters. Accordingly, it becomes very difficult to focus attention upon a 
particular continuous-time parameter. In order to monitor a particular conti¬ 
nuous-time parameter, it is generally necessary to estimate the full order 
discrete-time parameter vector. In other words, as it is difficult to separate 
known parameters from unknown ones, partitioning is difficult. Moreover, the 
discrete-time parameters become abstract with a dependence on the sampling 
interval. This is a disadvantage as the discrete-time parameters often have 
little physical meaning. 

A related problem is how to identify accurate continuous-time transfer func¬ 
tions from data and, in particular, how to obtain good estimates of the zeros of 
a continuous-time transfer function. The difficulties in converting a discrete¬ 
time transfer function to a continuous-time transfer function are well known 
and related to the mapping f(z) = (lo gz)/h. Clearly, a poor parameter trans¬ 
lation affects both spectral properties (such as the frequency response) and 
time-domain properties (such as step and impulse responses). 
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Another aspect is that, to avoid interference from high-frequency dynamics 
corrupting the estimation, discrete-time identification requires anti-aliasing 
filters of the input-output data before sampling. A good frequency cut-off prop¬ 
erty of a sampling filter would require noncausal operations which cannot be 
implemented. A causal filter with sufficiently good damping of high frequen¬ 
cies may, on the other hand, eliminate too much of the useful low-frequency 
contents or introduce a delay. Efficient elimination of the high-frequency com¬ 
ponents is therefore difficult in the case of on-line identification. Moreover, the 
sampling filter will be incorporated as part of a discrete-time process model, 
and it is difficult to separate filter parameters from process parameters of 
physical significance. This situation is sometimes a dilemma where the short¬ 
comings of discrete-time ARMAX-type approaches to estimation become ob¬ 
vious. 

The methodology presented requires implementation of filters A 1 ,..., A" which 
operate on input-output data. Maximum-likelihood identification based on the 
parametric model results in consistent and asymptotically efficient estimates 
provided that noise is normally distributed. In addition to the maximum- 
likelihood properties, there is evidence from simulation studies that the meth¬ 
od has favorable properties in reproducing transient reponses and frequency 
responses which are relevant aspects of physical modeling. Finally, the poten¬ 
tial for real-time application and adaptive control is very interesting, as the 
sampling period for the regulator may be chosen independently from that of 
the identification. Unlike ARMAX-model based discrete-time adaptive regu¬ 
lators, there is no obligation to choose a certain sampling period to satisfy the 
needs of both control and identication. 
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APPENDIX 12.1 — THE CRAMER-RAO LOWER BOUND 

The log-likelihood function for Gaussian distributed disturbances is given by 

1 N 

lo gL(6,R) = -- (d)R~ 1 £/ t (6) - —logdetR + constant (12.115) 


k =i 


Assuming R and 6 to be independently parametrized and R to be unknown 
we find 


8\ogL(6,R) 
86 ~ 


N 


--E 




4=1 


86 


R- l e k (6) 


and 




k=l 


= ^(R^R N (6)R- 1 ) ij -^-(R- 1 ) i 


(12.116) 


where e, is a zero vector except for 1 in the ith position. 
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The Fisher information matrix gives the Cramer-Rao lower bound for an un¬ 
biased estimate 


J = £{ 


,8\ozL(8JR)\T 
v ae i 

/d\og L{8J{)\T 
t dHJj > J 


S. 


a\ogL(eji) 

ae 


a\ogL(8fi) 'l , 


(12.117) 


evaluated at the correct parameters 6 X and R for which e k {9) = w k . It fol¬ 
lows that all terms !£ {(3 log LIddk)d log LIdRij) = 0 under the assumption of 
uncorrelated disturbances Wk whereas 


glogL(g.,R) )r aiogL(fl. J g) ) = Etvf y i {B)R- l Wi wjR- x vfm 

90 90 i^i j-i 

= NLWj{0)R-Wjm 

We conclude that, assuming noise to be normally distributed, the estimate 6 
is asymptotically statistically efficient because Rn(6n) converges to the limit 
lim.v_^x,i?A'(0r) = R as N -»• oo. ■ 


APPENDIX 12.2 — THE HESSIAN MATRIX 


Using Eq. (12.66) it is straightforward to express the residual model as 


£k 


HI (n) 

yk + ttiy\ + ••• + «»** 


or 


1 

H* 

£ 

—■ 

1 

I 

■Pn 4" 1 - 

(12.118) 

Yi£' k ] ••• 

rA"' 


C(z)e{0 x ) = y k + aiy* 11 + ■ • • + «„yi" 

'-M"— 

(12.119) 


From Eq. (12.119) we derive the partial derivatives 

= 1 t/l 

dccj C(zy k 

d£k _1_ [j\ 

dPj C(z) k 

* = _ *1./) 

dyj C(z ) k 


( 12 . 120 ) 
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As C{z) is not exactly known, we approximate it by an estimate C(z) based 
on an available estimate 9 and use the evaluation 


d£k 1 tj\ 


__ ~_j/l-'l 

Wj C(z) k 

d£ k __w 

$«/ ~ C(z) k 


( 12 . 121 ) 


The matrix second-order derivative of the loss function Vn(8) is 


N 


v 2 v n ( 0 ) = ^X>*(*)itf(*)tf(0) 


N 

2 


*=i 

N 


N 


+ — 


N 


£=1 




89 


ff ^. . 89 Rn k ' 0 ' 

k—1 


( 12 . 122 ) 


According to assumptions made concerning uncorrelated disturbances with 
{ £ *(0r)} = { w k }, it is possible to neglect the second and third terms close to 
the stationary point 9^ = 9 r . ■ 


APPENDIX 12.3 — PROOF OF THEOREM 12.1 


Let yi be the output of i serial operators A operating on y 

yi{t) = [Vy)(t) 

The transfer operator from the input u to the output y n is then 

y»(t) = 


A'(p) 


with 


A'{p) = A(p){p + a) n = p 2n + a\p 2n 1 + ... + a' 2n 
B'(p) = B(p){p + a) n = b' lP 2n ~ l + ... + 6'„ 

A fractional form for Eq. (12.123) is 


(12.123) 


(12.124) 


(12.84) 


A'ip)^(t) = u(t) 
y(t) = B'(p)S'{t) 
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The state-space realization on the controllable canonical form is given by 

r (o i 


d 

dt 


l *2„(0 J 


~°i 
1 0 

0 1 

l 0 ... 





' *i(0 ' 


r l' 




0 


j 

+ 



* *2/1 (0 > 


.0, 


u(t) 


... *'(/) 

where the state vector components are given by 

x'(t) = p^'it) 

Consider now the fractional form relating u and y 


(12.85) 

( 12 . 86 ) 


A(p)£(0 = u(t) 
y(t) = B(p)£(t) 


with 

(p + a) n ?(t)=4it) ( 12 . 88 ) 

With this state representation it holds that 

[Xu](t)= -^A(p){p + (t) 

= a(p 2n ~ 1 £'(/)) + • • • + a n a n (p°Z'(t)) = axi(/) + ... - a n a n x' 2n (t) 


[Vu](t) = A(p)a*‘(p + o) B -^'(0 
[A f y(0](0 = S(P)«‘‘(P + a) n -^'(/) 


[l n y](0 = a" ( 6i ... 6 0 0 ... 0 J *'(/) 


B(P) 

A(p) 


(- 




(12.87) 

This means that all components of (p x may be expressed as linear combinations 
of the components of x '. Hence 


<p T (t) = M'x'(t) 


(12.125) 
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The next step is to show that x' may be expressed as a linear transformation 
of (p x based on the fractional form expressed in the operator X. 



j A*(A)^(0 = u{t) 

(12.126) 




with co-prime A* 

and B * and with 



&(*) = (p + a) n g(t) 

(12.127) 

Recall that 

(P + a) n £'(t) = !;(t) 

(12.88) 

This gives that 

m = ( pl a )Z " Ut) 

(12.128) 

From Eq. (12.86) and Eq. (12.128) it is found that 



Xi(t) = = PiWZxit) 

(12.129) 


where P , for 1 < i < 2 n are polynomials in the operator X. 


p.(X\ = ((P + «) ~ g) 2n-i 
‘ ( } (p + a) 2n 


= a 


(-=—)'(1 - 
p + a 


p + a 


2 n-i 


It can be seen from the following relationship that all P,’s contain powers of 
X from 1 to 2 n. 


Pi{X) = a-‘A £ ( 1 - X) 2n ~ l = jr f 2n . (12.130) 

j=o ' J ' 

The factorization polynomials A*(A) and B*(X) are co-prime. The ring of poly¬ 
nomials is an integral domain, and the Diophantine equations 

A-{X)R\{K) + B\X)S^X) = P;(X); i = l,...,2n (12.131) 

therefore have solutions for all i in the given interval. The solutions are such 
that there are solutions with 

R\(X) = r n X + ... + r in X n 
S;(X) =s n X + ... + s in X n 


(12.132) 
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From Eq. (12.126) and Eq. (12.131) it is found for i = 1.2 n that 

R\{l)u{t) + S;(A)y(0 = P-(A)^(O = (12.133) 

The constraints on the polynomial degrees are of the form 

x'i(t) = ^ r,i ... r,„ sa ... Si n J <Pr(t) (12.1o4) 

with \ 

<pt{t) = {[Z-ym ... [A B y](0 [Au](0 ... [A"tt](0 J 

Let the matrix T t be 

( ru ••• r in Su ••• Sin 


Tr = 


r (2n)l r (2n)n s (2 n)n s {2n)n 

Then it holds that 

x' (t) = T r 0 x{t) 

It can be concluded from Eq. (12.125) and Eq. (12.136) that 

T- 1 = M' 

Hence, T r is an invertible matrix relating x' and (p r . 


(12.135) 


(12.136) 

(12.137) 


12.11 EXERCISES 


12.1 Parametrization 

Assume that a DC-motor may be described by a transfer function Go(s) 
between input u (= voltage) and output y (= angular velocity) 


Go(s) = 


Y(s) . K 

U(s) Js + D 


(12.138) 


The parameters K,J ,0 denote gain, moment of inertia, and damping, 
respectively. 

a. Determine a continuous-time parametrization for identification of 
K, J, and D. What parameters are identifiable from input-output 
data? What extra information might be needed for full identifiabil- 
ity? 

b. Formulate a recursive estimation algorithm to find the parameters. 
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12.2 Consider the set-up of Exercise 8.1. Assume that there is a known sinu¬ 
soidal measurement disturbance at the frequency (Do [rad/s]. 

a. Formulate a regressor filtering that effectively reduces the estima¬ 
tion error. 

b. What implementation aspects may be important in the context of 

regressor filtering? 

12.3 Devise a software procedure to plot the frequency response for the DC- 
motor identification. Also include error bounds of the estimated Bode 
diagram based on the Parseval relation 

£(t) = y(t) -(Fvit) 

e(s) = L[e{t)} (s) = Y(s ) - P<I>(s) = (G 0 (s) - G 0 (s))U(s) (12.139) 

= {£(s).e(s)} = ||Go(s)-Go(s)|| 2 ||C/(s )|| 2 


A Nyquist or Nichols diagram may substitute for the Bode diagram rep¬ 
resentation. 

12.4 Show that the operator X can be used to formulate a parametric model 
for estimation of physical coefficients of the robot equations M(q)q + 
C(q,q)q + G(q) - f when q, q but not q are available for measurement. 

Hint: Show that filtering of the Euler-Lagrange equations (7.5) by X 
gives the relationship 


l_dL 
T 0 dq 



dL 

dq 1 


+ X{ 



+ X{t] , 


where X = -- 

1 + STq 


(12.140) 


and where L - (1/2 )q T M(q)q - U(q) and r are the applied external 
forces. ■ 
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Multidimensional 

Identification 


13.1 INTRODUCTION 

The problems presented in previous chapters all had time as the independent 
coordinate variable. There are, however, physical coordinates which in some 
cases can be regarded as independent variables. Several standard problems 
deal with both temporal and spatial coordinates as independent variables, and 
the corresponding physical modeling gives rise to partial differential equa¬ 
tions. Measurement devices that are adapted to this kind of modeling are 
often found in the form of sensor arrays in one or several spatial dimen- 
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Control 



Figure 13.1 Industrial materials handling with control actions u\,..., u m affecting 
the material distribution and product quality measured in the variables vi,...,y„ 
after some time delay r. 


sions. Another example is geometric modeling and image processing in two 
dimensions (2D) and three dimensions (3D) which usually are represented by 
discretized models. 

In the case of discrete variables there is actually no quite clear-cut distinc¬ 
tion between multivariate, multidimensional, and multi-input multi-output 
systems. Before considering the many difficult problems in multidimensional 
identification, it is worth mentioning that many problems that appear to be 
multidimensional at first sight may sometimes be reformulated to standard 
least-squares problems. 

Example 13.1—Industrial materials handling 

Consider the industrial materials handling problem in Fig. 13.1 where the 
control actions u\, ..., u m affect the material distribution through a set of noz¬ 
zles and where the product quality is measured in the variables yi,...,y n 
after some time delay t. Let the result of the input variables on each one 
of the output variables be described by a matrix 0 and let the vector-valued 
measurements and control actions from time t = ti,..., tff be organized as 

Ji(ti + T) ... yi(tw + r) n ( Ui(t{) ... Ui(t N ) ' 

: : = 0 : : (13.1) 

y n {ti + r) ... y n (t N + t) ) lu m (t!) ... u m {t N ), 

or 


y = ezi. 


0 e R nxm 


(13.2) 
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The least-squares estimate of © is 

0 = ‘yu T { ( im T )- 1 (13.3) 

and we conclude that the problem of finding the interaction matrix 0 can be 
solved with well-known methods. ■ 

In order to formulate methods for algebraic and spectral analysis applicable to 
more complicated systems we need the multidimensional Laplace and Fourier 
transforms. We start this chapter by giving a short overview of the multi¬ 
dimensional Laplace transform and the associated transfer function algebra. 
We also give some attention to the multidimensional fast Fourier transform 
and alternatives such as the Walsh/Hadamard transform. 


13.2 TWO-DIMENSIONAL TRANSFORMS 


Two-dimensional Laplace transform is defined via 


/ +oo r-rOO 

/ Jc(£i,i 2 )exp(-Sifi -S2t2)dtidt 2 (13.4) 

oo J — OQ 

for a variable x(ti,t 2 ) that depends on two coordinates tut?. The first quad¬ 
rant transform 


r+oo r+OO 

L 2 {x(t u t 2 )} = X(si,s 2 ) = / / x(t ll t 2 )exp(-s 1 t 1 -s 2 t 2 )dtidt 2 (13.5) 

Jo Jo 

is the Laplace transform restricted to integration of positive t%,t 2 and hence 
corresponds to the one-sided Laplace transform for one-dimensional signals. 

The dependency of the output of a linear causal system is characterized by 
the two-dimensional convolution equation 


y(t) 



g(t\, t 2 )u(t t 2 )dtidt 2 


(13.6) 


and the transfer function 


Y(si,s 2 ) 

£/(si,s 2 ) 


= G(si,s 2 ) = -^ 2 {g(ti,t 2 )} 


(13.7) 


Note that the transform algebra is particularly straightforward when si, s 2 are 
separable variables, i.e., a function whose two-dimensional Laplace transform 
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L 2 {f(tut 2 )} = F(si .S 2 ) = Fi(si)F 2 (s 2 ) for some functions Fi and F 2 that 
each one only depend on one variable. 

If we turn our attention to discrete models we have the two-dimensional z- 
transform of a signal x(t\, t 2 ) discretized as the array {Xy j defined as 

OO OO 

X z (z lt Z2) = Z 2 [x} = ( 13 ‘ 8 ) 

i s= — 00 j =—00 

Let us consider a two-dimensional weighting function h{ti,t 2 ) discretized as 
the array [hy] . The discrete transfer function in two dimensions is defined 
as 

OO OO 

h( Zi ,z 2 ) = z 2 {h] = y, E vrW ( 13 - 9 ) 

i=- OO Jss- 00 

where h(ti,t 2 ) is the weighting function and {/iy} are its discretized values 
according to some sampling principle. An important case is the first quadrant 
transfer function where h{t\, t%) = 0 for t\ < 0 and £2 < 0 which corresponds 
to a causal system in the one-dimensional case. The first quadrant discrete 
transfer function in two dimensions is 


H(z lt z 2 ) = Z 2 {h) = ijztzJ (13.10) 

i = 0j = 0 


where h{t\,t 2 ) is the weighting function. 


13.3 TWO-DiMENSiONAL SYSTEM ANALYSIS 


First quadrant transfer functions are easy to implement because the output 
value y mn at a given point depends only on those points i of the input 
sequence for which i < mj < n ; see Fig. 13.2. A first quadrant transfer 
function with denominator polynomial A(zi,Z 2 ) and a numerator polynomial 
B(zi,Z 2 ) allows a reformulation of the transfer function polynomials to the 
denominator polynomial A^zf^z^ 1 ) and numerator polynomial B* (z\ l ,zl l ) 
so that 


H(z u z 2 ) 


ISeiiiA = B i z ^) = fi'frrW) with A . (0)0 ) = i 

U(zi,z 2 ) A(z h z 2 ) A'iz^.Zz 1 )’ 

(13.11) 
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Figure 13.2 The computation of y mn in a first quadrant system only requires values 
of y,j and u tJ for i < m and j < n. 


Realization of the transfer function in Eq. (13.11) as a filter is straightforward 

y(t I,t 2 ) = (1 - A‘(zi 1 ,Z 2 1 ))y(t 1 ,t 2 ) + S‘( 2 rr 1 ,z 2 - 1 )u(^,< 2 ) (13.12) 

where z 7 1 and zj 1 are interpreted as causal shift operators, i.e ., the autore¬ 
gressive dependency goes for each shift operator in one direction only; see Fig. 
13.2. 


13.4 STABILITY 


Let us consider first quadrant transfer functions on the form of a rational 
function 


H(z i,z 2 ) 


B(Zl*Z2) 

A(z:,z 2 ) 


(13.13) 


where A(z\,z 2 ) and B(zi,z 2 ) are mutually prime (i.e, A and B have no common 
factors). This transfer function can be expanded as 


H(zi,z 2 ) = E Y2 h U z \ lz 2 J (13.14) 

i= 0 j =0 


A filter of the first quadrant is stable if and only if 


EDM 


i = 0j = 0 


< 00 


(13.15) 
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There are also algebraic stability criteria similar to pole-zero analysis for one¬ 
dimensional systems. The transfer function H is said to have a nonessential 
singularity of the first kind at the points 

[(zi,z 2 ): A(z u z 2 ) = 0; B(zi,z 2 ) f 0} (13.16) 

which corresponds to the poles of a one-dimensional system. In addition, there 
are singularities reminescent of pole-zero cancellations but without any real 
counterpart for one-dimensional systems. The nonessential singularity of the 
second kind is defined as 

{(z!,z 2 ): A(zi,z 2 ) = 0; B(z u z 2 ) = 0} (13.17) 

Note that a nonessential singularity of the second kind may occur even though 
A and B are mutually prime. The reason is obviously that the pole surface 
A(zi,z 2 ) and the zero surface B{z\,z 2 ) intersect in C 2 . A sufficient but not 
necessary condition of stability for discrete systems in two variables is 

A(zi,z 2 ) ^ 0 V(zi,z 2 ) e {(zi,z 2 ): \zi\ > l,jz 2 | > 1} (13.18) 

The following example is a counterexample to show that Eq. (13.18) is not a 
necessary condition of stability. 

Example 13.2—Stability of two-dimensional transfer functions 
Consider the two transfer functions 


H\{z\,z 2 ) 

H 2 {z\,z 2 ) 


(1 -z 1 ) 8 (1-z 2 ) 8 
2 - zi - z 2 

(1 -Zi)(l-Z 2 ) 

2 - zi - z 2 


(13.19) 


The transfer functions Hi{z\,z 2 ) and H 2 {zi,z 2 ) both have nonessential singu¬ 
larities of the second kind at z\ = 1 and z 2 = 1. However, a closer investi¬ 
gation of expansions of Hi and H 2 according to Eq. (13.15) shows that the 
transfer function Hi is stable whereas H 2 is unstable. Thus, there is an effect 
of the numerator on tlitr system stability so that the condition in Eq. (13.18) 
is sufficient but not necessary for stability. ■ 

Let us now consider continuous-time dynamical systems. A condition for sta¬ 
bility of a causal two-dimensional dynamical system 


G ! (si,s 2 ) = 


g(si,s 2 ) 

A(si,s 2 ) 


(13.20) 
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Figure 13.3 One-dimensional heat flow along the spatial coordinate x with control 
action u(t) and measurement y(t) obtained as a temperature T(x 0 ,t) at a distance 
xo from the control input. 

is that A(si,S 2 ) ^ 0 for Re (si) > 0 and Re (s 2 ) S 0 . 

Example 13.3—A partial differential equation for heat flow analysis 
Consider the one-dimensional heat flow problem shown in Fig. 13.3 where the 
temperature T — 'F (x, t) is controlled by means of heating at the boundary at 
x = 0. This control variable is denoted u(t ) and is balanced by the fact that 
the temperature at x = 1 is zero. Assume that the temperature measurement 
at the point x = x 0 constitutes the observed variable 

y(t) = T (x 0 , t), 0 < x 0 < 1 (13.21) 


The one-dimensional heat flow equation is 


dT(x,t) 
P 8t 


d 2 T{x,t) 

dx 2 


(13.22) 


where 1/p is a thermal diffusion coefficient. A typical tempearture pulse 
response in spatial and temporal coordinates is shown in Fig. 13.4. The 
relevant boundary conditions and initial conditions are 


T(0,t) = u{t) 

T(l,t) = 0 (13.23) 

T(x, 0) = T 0 (x) 


Suppose that we forget about initial conditions (i.e. , put T'o(x) - 0) and look 
for the transfer function from the input u to the temperature y(t) = T(xo,t ) 
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Temperature response 



Figure 13.4 Spatial and temporal response of temperature T(x, t) to a rectangular- 
formed pulse input u{t). 


at some given point x 0 along the x-axis. The Laplace transform of Eq. (13.22) 
with respect to time t is then 


1 cPTt&s) 
p dx 2 


sT t (x,s) = 0 

T t (0,s) = U(s) 

r,(i.s) = 0 


(13.24) 


Solving the second-order differential equation for T t (x, s) and using the bound¬ 
ary conditions yields the transfer function 


G(s,p) = = ^hd-^o )JjTs 

U(s) U (s) sinh y/ps 

The Nyquist diagram for some different measurement points xo is shown in 
Fig. 13.5. This transfer function exhibits a complicated dependence on the 
thermal diffusion coefficient p with poles at s = -k^n^jp for k = 0,1,2 ,... 
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and also an infinite number of zeros. The parameter p appears in both the 
numerator and the denominator of the transfer function, and this complicated 
dependence motivates the term distributed parameter , and systems described 
by such parameters are accordingly called distributed parameter systems. 

Now consider the Laplace transform with respect to both x and t and denote 
the transformed complex frequency variables by s x and s t , respectively. The 
two-dimensional Laplace transform of Eq. (13.22) in temporal frequency s t 
and spatial frequency s x is 


s x T x . e (s x ,s t ) - s x T t (0,s t ) - ps t T X ' t (s x ,s t ) = 0 (13.26) 


with the boundary conditions 


( T t (0,s t ) = U(s t ) 

\ T t (l,s t ) - 0 

By substituting the boundary condition we obtain 

s 2 x T X ' t (s x ,s t ) - ps t T X ' t (s x ,s t ) = s x U(s,) 

The two-dimensional transfer function relationship is 

T x , t (s x ,s t ) = ■ . U(s t ) 

sj-ps t 


(13.27) 


(13.28) 


(13.29) 


The transfer function exhibits a nonessential singularity of the first kind at 


s x — \/pSt (13.30) 

This type of singularity corresponds to poles in a one-dimensional system 
whereas the nonessential singularity of the second kind is s x = s t = 0. Notice 
that the expression y/ps t for the nonessential singularity of the first kind 
appears in the transfer function of Eq. (13.25). 

As the thermal diffusion takes place in both directions there is no simple 
one-directional dependence along the x-axis. The system is for this reason 
not a first quadrant system, and it is more difficult to design a filtering so¬ 
lution to the identification problem. Identification can, however, be made by 
minimizing the functional 

P = ar s min J(p) = arg min ^ | G(ico k ,p) - 


2zr 

(13.31) 
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Determination of thermal diffusion coefficient 



Figure 13.6 Determination of the thermal diffusion coefficient p by explicit mini¬ 
mization of the functional in Eq. (13.31). 


where the transfer function G in Eq. (13.25) is fitted to the ratio of the 
discrete Fourier transforms Y # and Upj, The cost functional is shown in Fig. 
13.6 for the case p = 100 and this explicit optimization obviously solves the 
identification problem. ® 

A pragmatic approach to this identification problem is, of course, to fit a low 
order rational transfer function to data. 


13.5 DELAY-DIFFERENTIAL SYSTEMS 

There are several modeling problems arising for instance in physics and tech¬ 
nology where the evolution of the system depends not only on the present 
state but also on past data. Identification problems of this type are obtained 
for processes with reflections, echoes, and material flows with transport de¬ 
lays and recirculation dynamics. Such temporal dependencies give rise to 
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differential equations including both differential operators and time-delay op¬ 
erators, and systems are for this reason called differential-difference systems 
or delay-differential systems. 

Example 13.4—identification of a delay-differential system 
A typical problem is to estimate the parameters ai,a2, b of the model 


x{t) = -aix{t) + a, 2 x(t - t) + bu(t - t) 

y(t) = *(0 


(13.32) 


where the time delay r is supposed to be known. Notice that the model in Eq. 
(13.32) is a linear system with the transfer function 


G(s 


Y(s) 

U(s) 


6e-* r 

s + (Xi — a 2 e“ sr 


(13.33) 


which is not a rational transfer function because of the denominator term 
proportional to e ~ st . It is clear from Fig. 13.7 that the impulse response, the 
Bode diagram, and the Nyquist diagram all exhibit complicated characteristics 
despite the apparent simplicity of the parametric model in Eq. (13.33) or Eq. 
(13.32). ■ 

A transfer function algebra similar to that in Chapter 12 can be proposed in 
order to solve this problem. The operators may be, for instance, the differential 
operator p and some time-delay operator. Another choice is the following 
causal operators 


l+pti (13.34) 

A 2 = e~ pt2 

for some positive time-constants Ti, r 2 which constitute a set of design pa¬ 
rameters. These operators are a good choice because they are causal, stable 
operators with finite gain, and the operators commute in the context of trans¬ 
fer functions. The operator X\ is a low pass filter operator similar to that in 
Chapter 12 and A 2 represents a time delay r 2 . Notice also that the transfor¬ 
mation results in new parameters which are linear with respect to the original 
parameters. The transfer function polynomials are then 


A(Ai,A 2 )y(£) = B{X\,X2)u{t) 


(13.35) 



Im G(iw) Magnitude 
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Re G(iw) Frequency [Hz] 

Figure 13.7 Impulse response, Bode diagram, and Nyquist diagram for a delay- 
differential system with a transfer function in Eq. (13.33) and parameters a\ = 1, 
ao = 1, b = 1, and r = 10. 


for polynomials in two indeterminates a\ and /I 2 , 

n\A n 2 A 

1=0 1=0 

Hie n 2B 

B{X i,a 2 ) = i4 

i = 0 i = 0 




CLOO = 1 


(13.36) 
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for some finite orders rii A ,n 2 A , ni B ,n 2B ■ It is clear from Eq. (13.36) that the 
system is a first quadrant system which facilitates a filtering solution to iden¬ 
tification. The condition aoo = 1 is imposed for the purpose of normalization 
and is possible if the transfer function is both causal and proper 


n 2A "IE "2B s r . n . n 

i=0 j=0 i=0j'=0 ^ J 


n I S «2 8 


and we formulate in the standard fashion 


(13.37) 


y{t)=[-Xiy ... -X\"X n “y Am ... AJ 1B A^ B u) 


aio 


b io 


u 


(13.38) 


nisn 2 B J 


which is in linear regression form y(t) = <f> T (t)0. The constant parameters 6 
may now be estimated by any suitable method for identification of a linear 
model. A natural choice is to sample y{t) and Clearly, the bandwidth of 
X\ and the sampling period are design parameters which can be modified in 
order to improve the estimation. 

Example 13,5—Identification of a delay-differential system (cont’d.) 
Consider the following delay-differential equation for the system 


x(t) = -aix(t) + d 2 x(t - r) + bu(t - r) 
y(t) = x(t) 

with the transfer operator 

Y(X lt X 2 ) bX x X 2 


(13.39) 


(13.40) 


U ( X \, X 2 ) 1 + (a\ — 1)X\ — d 2 X\X 2 

where 

Xi = -!=-, X 2 =e~ sr (13.41) 

s + 1 

Identification of the system based on the data shown in Fig. 13.8 and with 
t = 10, ai = 1 , a 2 = 1, b - 1 provides the estimates 


(13.42) 


^ > 
d i 


1.0056 'j 

02 

= 

0.9923 

. b , 


. 0.9909 , 
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Figure 13.8 A delay-differential system described by the system equation (13.39) 
with parameters a\ = 1, ai = 1, and 6 = 1. The least-squares solution provides the 
estimates a\ = 1.0056, 02 = 0.9923, and 6 = 0.9909. 


which shows that method works as expected. ■ 

The method lends itself to recursive identification, and a similar approach to 
adaptive control has been reported. 


13.6 TWO-DIMENSIONAL SPECTRA 


Let X = {xy} denote an array of data and consider the transform Y with 
components 


>'U = 


M-lN-l 


jmn ^ ^ 

ViKiiV m=0/i = 0 


ijmn 


(13.43) 


where Wij mn is some weighting function. In the special case when u)ij mn is 
separable on the two axes, i.e., when Wi jmn = b m ia n j for some matrix com¬ 
ponents {a; y -} and {&(/}, it is possible to write the relationship between the 
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Y = - 1 B t XA (13.44) 

y/MN 

where A, B are the matrices of components Oy and by, respectively. A stand¬ 
ard choice is, of course, to use the matrices in Eq. (5.83) used in the discrete 
Fourier transform. 


A = B = <X> = 


yicooh 


1 

e i(0\h 


[ e i<y 0 (W-l)A e icoi{N-l)h 

for discrete frequency points co^ = k{2n/Nh). 


g icos-ih 
e io>N-\{N-l)h J 


(13.45) 


The Walsh/Hadamard transform 

Two-dimensional and multidimensional transforms obviously require much 
computation, and it is natural to search for alternatives and simplifications 
of the calculation of spectra. One such transform based on binary functions is 
known as the Walsh/Hadamard transform, which is analogous to the Fourier 
transform in the sense that its result is a form of spectrum analysis. The 
binary functions known as Walsh functions form a complete, orthogonal basis 
on the interval [-0.5,0.5] of real numbers. A function f(x) defined on the 
interval [-0.5,0.5] can thus be expanded as 


oo 

f{x) = a 0 wal(0, x) + ^a n cal(/i, x) + & n sal(/z,x) (13.46) 

n=l 


where 


wal(0, x) 



-1/2 < x <1/2 
x < -1/2, or x > 1/2 


and where cal(n, x) and sal(n, x) correspond to cosine and sine functions, re¬ 
spectively. The first few functions cal(n, x) and sal(n,x) can be obtained as 
the signum function of the corresponding sinusoidal functions (see Fig. 13.9), 
and the higher-order Walsh functions are easily generated by means of the 
Hadamard matrices, as shown below. 
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-0.5 0 0.5 


x 

Figure 13.10 A Walsh/Hadamard spectrum of a sinusoid and the inverse transform 
with sequency in increasing order. All coefficients (a„) are zero. 

The coefficients of the Walsh series expansions are obtained as 


r 1/2 

a 0 = /'(jc)wal(O, x)dx 

J- 1/2 

f m 

a n= f(x)ca\(n,x)dx (13.47) 

J- 1/2 

[in 

b n = f(x)sa\(n, x)dx 

J- 1/2 

The number n determines the number of zero crossings of the functions cal(/i, x) 
and sal(n, x), and it is customary to define this number as the sequency . This 
number is analogous to frequency for periodic functions (see Fig. 13.10.) 

Example 13.6—A Walsh/Hadamard series expansion 
The discrete Walsh/Hadamard spectrum of a sinusoid is shown in Fig. 13.10 
together with the inverse transform. The Walsh spectrum is shown in a dia¬ 
gram with b n versus sequency n (another approach is to represent some other 
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function such as \Ja\ + b 2 n versus sequency n). The transform is computa¬ 
tionally efficient and gives a spectrum that is analogous to that of traditional 
Fourier spectra. * 

In particular, the Hadamard matrices H n with N = 2 n rows and columns 
for Vt = 1,2,3,... are easy to generate recursively. An N x N -matrix can be 
generated by the following simple matrix composition miles 

(13.48) 

It is straightforward to verify that 

HH t = N -InxN for H e R NxN (13.49) 

An example of a matrix generated this way is 

rl 1 1 1 1 1 1 

1-11-11-11 
11-1-11 1 -1 
1-1-11 1-1-1 
Hs " 1 1 1 1 -1 -1 -1 
1-11-1-11 -1 
11-1-1 _1 -1 1 
,1-1-11-11 1 

A minor inconvenience is that the Walsh function generated in this manner 
from the columns or rows of this matrix is not sorted with respect to sequency. 

The Walsh/Hadamard two-dimensional transform of X is accordingly 

Y = ^ HXH t (13.51) 

N 

and the inverse transform is 

X = ±H t YH (13.52) 

N 

Both the transform and the inverse transform are thus computed with a min¬ 
imum of computational complexity with respect to multiplications and addi 
tions, which is (N log 2 N)^. 


1 

-1 

-1 

1 

-1 

1 

1 

-1 ) 


(13.50) 
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Nonlinear System 
Identification 


Wer immer strebend sich bemiiht 
den konnen wir erlosen 
—J.W. Goethe 


14.1 INTRODUCTION 

The identification of mathematical models in the form of single-input single¬ 
output linear systems is well developed both with respect to parameter estima¬ 
tion and structure determination. However, it is considered difficult to apply 
linear systems identification to nonlinear systems. A pessimistic view is to 
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regard nonlinear systems as the complement to the class of systems for which 
systematic identification methods exist. A similar general criticism of linear 
systems identification in applications is that a priori structural knowledge is 
difficult to include in any useful way. It can be concluded that nonlinear sys¬ 
tems identification in general is complicated because there are problems both 
of parameter estimation and of structure determination (see Mehra, 1979; 
Billings and Leontaritis, 1982). 

Nonlinear systems are often represented by means of differential eouations of 
the form 

*(0 = f(x(t)) + g(x(t) 9 u{t)) + v(t) 
y{t) = h(x(t),u(t)) 

where f and g are some functions and where x, u, v are a state vector, system 
input, and disturbance, respectively. It is sometimes restrictive to assume 
the form in Eq. (14.1), and in such cases it might be preferable to use the 
relationship 

x(t), u{t), v(t)) = 0 (14.2) 

where F is some nonlinear function. Discrete-time dynamics is often repre¬ 
sented in the form 

Xk~i = f(xk,u k ) (14.3) 

although such systems are not as much used as the differential equation mod¬ 
els in Eq. (14.1). A restricted class of discrete-time models are the cascaded 
block structure models shown in Fig. 14.1. These models consist of ARMAX- 
type models with input or output nonlinearities such as 


B[z~ x ) 

A(z-i) 


F(u k ) 


yk = 


(14.4) 
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Models with input nonlinearities F of the type in Eq. (14.4) are known as 
Hammer stein models whereas models with output nonlinearities Eq. (14.5) 
are a subset of what is known as Wiener models. The systems represented by 
Eq. (14.5) can often be identified using straightforward extensions of linear 
estimation models with, for instance, a polynomial expansion to represent the 
nonlinearity F or its inverse (Wigren, 1990). 

A nonlinear systems representation with a potential for time domain as well as 
frequency domain calculations is provided by the Volterra kernel representa¬ 
tion which leads to multilinear extensions of the transfer function notions used 
in linear systems. Identification of such multilinear transfer functions tradi¬ 
tionally relies on frequency domain methods, correlation analysis (Barker, 
1982; Barrett, 1982), or orthogonal expansions. The use of orthogonal poly¬ 
nomials is motivated in this context because terms can be added to the series 
and new parameters be evaluated without changing the previously obtained 
estimates. 

A natural requirement is to have methods suitable both for ordinary identifi¬ 
cation and for recursive methods. It is also preferable to have methods that 
allow operations both in the frequency domain and in the time domain. The 
identification methods should allow estimation of unknown parameters of a 
system with a known structure described by ordinary differential equations. 
In particular, the real-time identification of a small number of varying param¬ 
eters is often of interest for detection and adaptation. Such problems often 
appear in models derived from equations of classical mechanics such as rigid 
body motion of vehicles, robot manipulators, and in biomechanics. 


14.2 WIENER MODELS 


The most systematic outline of nonlinear system identification is the Wiener 
(1358) approach which involves Laguerre and Hermite series expansions to 
model the dynamic and the nonlinear aspects, respectively. The model outputs 
are formed as an infinite series of products of Hermite polynomials in the 
Laguerre coefficients of the past of the input, see Fig. 14.2. The Laguerre 
operators used can be viewed as a sequence of filters 



(14.6) 
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Figure 14.2 The Wiener network for identification of a nonlinear system. 


where each transfer function element 


1 - sr 

1 + ST 


(14.7) 


has a constant gain for all s = ico and a phase delay -2 arctan cor. That is, 
each such link represents an all-pass filter which make L* (s) reminescent of 
the delay line used for ARMAX models. Consequently, it is natural to regard 
the filtered inputs x *(f) = A -1 {AT^(s)} as a kind of state-space representation 
of the system input. 

The Hermite polynomial used in the series expansion may be obtained from 


H k (x) = (-l)V 


d k 

dx k 


(e~ x ), £ = 0,1.2.... 


(14.8) 


A few examples are 

H 0 (x) = 1 
Hrfx) = 2x 
H 2 (x) = 4.x 2 - 2 
H 3 (x) = 8x 3 - 12x 


(14.9) 


Other important relationships are that the Hermite polynomials H n (x) satisfy 
the recursive equation 


H k+1 (x) = 2xH k (x) - 2kH k _ 1 (x) (14.10) 


and the differential equation 


H' k (x)-2xH' k (x) + 2kH k (x) = 0 


(14.11) 
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The Hermite polynomials are formed from the Laguerre network outputs 
{ jc* (£)} and the model output is formed as 


?<o = E£- 5 > 

*,= 0*2 = 0 *«=0 


| kz—Kn 


H kl (xi(t))H k2 (x 2 (t)).,,HkA x n(t)) (14.12) 


where {c*,jfe 2 „.*.} is the set of coefficients to be estimated. In the case of a 
white noise input it is, for instance, possible to use the following correlation 
technique to solve for the coefficients; i.e. y 


Ckik2~k* 


1 *,= 0 * 2=0 *.=0 


(14.13) 


which follows from the orthogonal properties of the Hermite polynomials H k (x) 
provided that the measurement duration T is large enough and that the input 
is white noise. A problem in the context of identification is the overabundant 
number of coefficients {c*,* 2 _*„} that need to be estimated. Actually, this is 
such an important restriction that the method is rarely implemented. The 
Wiener approach, however, has had an important historical impact on identi¬ 
fication, and several specialized methods owe their debt to this idea. 


14.3 VOLTERRA-WiENER MODELS 

The multidimensional Laplace and Fourier transforms introduced in Chapter 
13 have an important application also for nonlinear systems. 


F(si,...,s„) = L n {f{h,...,t n )} 


TOO TOO 

= / ... e- ltl —^ U f(x(t U ...,tn))dti . t n 

Jti- o Jt„ = o 


(14.14) 


The multidimensional Laplace transform with the vector s i, — ,s n of Laplace 
variables and the time variables 



(14.15) 


has an immediate application for frequency domain descriptions of nonlinear 
systems. Let us consider an nth order convolution that defines the input 
output map of a nonlinear system described by an n-dimensional weighting 
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function or Volterra kernel g n . More generally, a nonlinear system output y 
from the system equation 


*(0 = A*(0) + g T (x(t))u(t) 
y(t) = h(x(t)) 


(14.16) 


may be represented as the Volterra series 


y(t ) 


n /»o 

*»(*) + £ \ 
k=i Jo 


> roo 

■J o gk(r!,-■ 


■, T k )u(t -Ti)...u(t- X k)dti ...dx k 


(14.17) 

where go(t) denotes the dependency on initial values. Notice that all causal 
Volterra kernels gk(t\, ...,<*) satisfy the relationship gk{t\,...,tk ) = 0 for any 
ti < 0 and i = 


The Volterra series expansion in Eq. (14.17) is clearly a natural extension of 
the convolution relationships in linear dynamical systems. The multidimen¬ 
sional Laplace transform of gk results in a transfer function 


/»+oo r+oo k 

G k(*i .«*) = ... gk(ti,...,t k ) exp(- ^2 Siti)dt 1 ... dt k 


(14.18) 


or 

t'kigkit i,...,£*)} = Gk(si,-.-*Sk) (14.19) 

The frequency domain description of the convolution Eq. (14.17) is then for 
each component 


Y k {si,...,s k ) = G k (s 1 ,....s k )U(s l )...U(s k ) (14.20) 

so that 

n 

Y( Sl ,...,s n ) = G 0 (s 1 ) + J2 G k(si,...,s k )U(s l )...U(s k ) (14.21) 

*=i 


Consider, for instance, the nonlinearly interconnected transfer function of the 
block diagram in Fig. 14.3. A convolution relationship between input and 
output variables is 



gi(r)u(t 



x)dx / g 2 {x)u{t - x)dx 
Jo 


gl(Tl)g2(x 2 )u(t - Ti)u(t - T 2 )dXidT 2 


(14.22) 
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and 


roo 

y(t) = / gz{r z )z{t-x z )dT Z 

JO 

fCQ poo pOO 

= 1 I g(Tl,T2,T 3 )u(t-T 3 -Ti)u(t-T 3 -T2)dTidT2dr 3 

Jo Jo Jo 

(14.23) 

where the weighting function of the convolution is 


^(*l,T2,r 3 ) = gl(Ti)g2(T 2 )g3(T 3 ) (14.24) 

Let us instead consider the output variable defined for two time variables fy 
and £2 so that 

poo poo 

Z = z(t u t 2 )= *:(fi)u(fi-Ti)dfi / g 2 (* 2 )u(t 2 -T 2 )dT 2 (14.25) 

Jo Jo 

If we try to find the input-output relationship and thus forget about the initial 
conditions, then it is straightforward to apply the two-dimensional Laplace 
transform in the complex frequency variables si and S 2 so that 


Z(s u s 2 ) = L 2 {z(t u t 2 )} = G 1 ( 8 i)G 2 (s 2 )U(si)U(s 2 ) (14.26) 

where the transfer function Gi, G 3 are the ordinary one-dimensional Laplace 
transforms of the weighting functions gi and g 3 , respectively. 

We now consider the output variable 


1 p<Ji+ioO /»CT2 + lOO 

y{h,t 2 ) = (^) 2 / / Y(s l ,s 2 )e s ' t ' +S > t *ds 1 ds 2 (14.27) 

Joi-ioo Jci-ioo 

and introduce t = t\ = t 2 and s = + s 2 so that 


y(0 = y(ti,t 2 ) 


r 

2 xi j ax 


<Tl+tOO 


F:(s)e 




(14.28) 


where Yi(s) is the ordinary one-dimensional Laplace transform of y(t). By 
equating Eq. (14.27) with Eq. (14.28) we find that Fi(s) must satisfy the 
relation 

I /*<72 + i00 

Yi(s) = — Y(s-S 2 ,s 2 )ds 2 (14.29) 

J02-100 
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U(s ,s ) 
1 2 



Figure 14.4 A block diagram containing transfer functions Gi, Go, G 3 intercon- 
nected in a nonlinear manner. 


Moreover, the one-dimensional Laplace transforms Y 1 (s) and Z\ Cs) - L {z{t)\ 
satisfy the transfer function relationships 


Yi(s) = Gs(s)Zi(s) = Gs(si + S2)Zi(s) 
Y(si, S2) = Gz(S\ + S2)Z(Si,Sz) 


(14.30) 


A transform algebra relevant for nonlinear systems can now be suggested as 
follows 

Gi(si)G 2 (s 2 ) = L 2 {gi(Tl)g2(T2)} ( 14 - 31 ) 


Z(si,s 2 ) = L 2 [z(tut 2 )} = G 1 {si)G 2 {s 2 )U(si)U{s 2 ) 
Y(si,s 2 ) = Gi(si)G 2 (s 2 )G 3 (si + s 2 )U{si)U(s 2 ) 


(14.32) 


Thus, the multiplicative interaction gives rise to frequency domain models 
where for some constant so a complex exponential e Sot applied on the sys¬ 
tem input does not propagate through the system independent of other input 


components. 


Example 14.1—A transfer function of a nonlinear system 
Consider the transfer functions 


Gl(s) = 7TT 

2 

G 2(s) = S 2 + 0 01 s + 2 

- 7Ts 


(14.33) 


interconnected according to Fig. 14.4. The resulting transfer function is 


•_ 1 2 3 

G(si,s 2 ) _ Sl + x + o.01s 2 + 2 si + s 2 + 3 


(14.34) 
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Bode diagram - Gain 



Bode diagram - Phase 



Figure 14.5 A Bode diagram for the transfer function in Example 14.1 in two 
variables si and S 2 evaluated along the imaginary axes. 


It is clear from the example that nonlinear systems also composed of cascaded 
linear systems with multiplicative interconnections can be described by trans¬ 
fer functions. The corresponding Bode diagram is shown in Fig. 14.1. ■ 

The method is, however, difficult to extend to systems with feedback, and 
the method is for this reason of limited value. A drawback in the context 
of identification is that the two-dimensional representation y{t\, < 2 ) is rarely 
known. It is for this reason difficult to apply methods of spectrum analysis to 
estimation of transfei functions. 

There are, of course, several reasons why the frequency domain approach 
has limited applicability. One reason for the success of complex exponentials 
applied to the input of a linear system is that each component propagates 
through the system independent of other components and is only affected in 
gain and phase delay. This is not the case for nonlinear systems, and it is hard 
to find other classes of function with similar properties. Hence, for nonlinear 
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Volterra kernel 



2D Transfer function magnitude 



Figure 14.6 Volterra kernel between input u and variable z in Example 14.1 esti¬ 
mated by means of correlation method in the case of a white-noise input. 

systems there is in general little point in giving special attention to inputs in 
the form of complex exponentials. Another point of criticism is the difficulty of 
relating the estimated Volterra kernels to a priori information (see Fig. 14.6). 


14.4 POWER SERIES EXPANSIONS 

Consider the nonlinear system 

k km 

x(t) = ^a t x‘(f) + + v(t) (14.35) 

t=l ( = 0.7 = 1 

where the identification problem consists in finding the coefficients {a,} and 
j bij) given observations of x and u. The system equations (14.35) are clearly 
of the form 


x{t) = f(x(t)) + g(x(t), u(t)) + v(t) 


( 14 . 36 ) 
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As we consider x and u to be the observed variables we can formulate basic 
time-domain and frequency-domain methods by means of linear regression 
methods and their extensions. We illustrate these methods by means of the 
following examples. 


Example 14.2—Time-domain identification of nonlinear system 
Consider the nonlinear system equation 


x = - a\x - buxu + boiu (14.37) 

Let us now evaluate the following intgrals over a sequence of intervals deter¬ 
mined by the points { tk } ^ =0 

/***♦! [t k + i ptk + l 

/ xdt = -ai / x(t)dt-bn / x{t)u(t)dt + b Q \ / u(t)dt (14.38) 

•'f* Jtk Jtk 

where the unknown parameters are 

e = [ai bn fool ) T (14-39) 

These integral relationships derived from Eq. 14.38 provide the equation 


H^ 

1 

o 


' - //„' xdt 

— xudt 

//; udt ' 


r ai ' 

* 

= 

l 

; 



foil 

- x(t N ) -x(t N -.i) . 


-~S‘L xdt 

— j! N xudt 

J tN -i 

IlL udt > 


. fooi i 


(14.40) 

By formulating Eq. (14.40) on the form *y — &0 it is straightforward to solve 
the estimation problem as a least-squares problem, Le ., 6 = (O r O)" 1 O r 9 r > or 
as some extension of a linear estimation problem. ■ 

Example 14.3—Frequency-domain method of nonlinear system 
Consider the nonlinear system equation and its Laplace transform 


x = -aix - buxu + boiu 
sX(s) = -aiX(s) - bnL{x{t)u{t)} +b 0 iU(s) 


(14.41) 


By evaluating 


y k = ico k X{ico k ) 

<t>I = ( -X(ico k ) -L{x{t)u{t))\ s = ia > k U{ico k )] 


(14.42) 
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we can arrange this problem as an ordinary least-squares problem with 



' y\ ' 




7 = 

\ 

, and 

O = 

• 


< yN < 



- 4>n - 


(14.43) 


with the solution 6 = (<I> T <I>) x <b Tt f . ■ 

An attractive property of the methods presented in the examples is that stand¬ 
ard statistical validation tests are applicable without extensive modifications. 

A standard problem for the frequency-domain methods based on the discrete 
Fourier transform is that the frequency points chosen have equal spacing in 
frequency. This implies that the high-frequency fit is favored if no special 
methods are applied. 

We illustrate these methods by means of a differential equation that has re¬ 
ceived much interest recently. 

Example 14.4—Identification of the Lorenz model 
Consider the Lorenz equations 


'h 

1 

b* 

ii 

•H 

' a " 


' 10 ’ 

y = px — y — xz, with 6 = 

P 

= 

28 

z = -pz + xy 

[fi J 


00 

CO 


(14.44) 


with trajectories according to Fig. 14.7. 


x(T) - x(0) 


o 

1 

ss 

*5 

0 ’ 

y(T) - y(0) + So y + xzdt 

= 

0 / 0 T xdt 

0 

. z(T) -z(0) — / 0 r xydt . 


0 0 

-fo T dt. 


' 0.0205 ' 


' 2.0535 

0 

0 

-2.6199 

• 10 3 = 

0 

-93.5693 

0 

k -1.6375 . 


. o 

0 

-614.0740 , 


(14.46) 


T 

which provide a solution 6 = £ 10.00 27.99 2.667 j with good accuracy. 
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z Time [s] 

Figure 14.7 Lorenz equations describe a nonlinear dynamical system with three 
states x,y,z and with three parameters a, p, and p. 


Modulating functions 

Let us consider the differential equation model in Example 14.2 and intro¬ 
duce the modulating functions {pa(0)*Li- Multiplication of all terms of the 
differential equation model to be identified and subsequent integration gives 
the relationship 


fT rT pT rT 

/ pjxdt = -ai / pjxdt-bi i / p\xudt + boi / pjudt (14.47) 
Jo Jo Jo Jo 

for each function Pk(l). The modulating functions can be chosen as anv set 
of differentiable functions, i.e., polynomials in data such as Pk(t) = x k (t) or 
trigonometric functions Pk(t) = sin(£<y<) or some set of orthogonal functions 
over an interval [0, T], 

A relevant problem is how to evaluate the left-hand integral in Eq. (14.47) 
since x might not be available to measurement. This problem is eliminated 
if the modulating functions {pk} are chosen such that their time derivatives 
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Figure 14.8 Input and output to a system described by the equation x - —a\x 
b u xu + 6 0 i“. Modulating functions chosen as sinusoids are shown in the lower 
graphs. 


{/>*} exist. Integrating by parts we have 



Pk(t)x(t)dt = [p*(f)*(0]o “ 



Pk(t)x(t)dt 


(14.48) 


Computation of the integral of Eq. (14.47) for N different modulating func¬ 
tions {/>*(£)}£=! yields 


' So Pi*dt ' 

< Jo Pwxdt . 


-So Pi xdt -SoPl xudt SoP T i udr 


' ai ' 

bn 

.-fiplxdt Jo pJjXudt SoPs udt ‘ 


„ *01 > 


(14.49) 


By formulating Eq. (14.49) on the form y = <&0 it is straightforward to solve 
the estimation problem as a linear estimation problem. 


Example 14.5—Identification by means of modulating functions 
Consider the input-output data in Fig. 14.8, which are observations u(t) and 
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x(t) = -aix(t) - bnx(t)u(t) + boiu(t). 



' Ol 


' 1' 

e = 

fell 

= 

2 


. feoi . 


.3, 


(14.50) 


and choose the modulating functions 


r Pi(0 ' 


r sin(f/3) ' 

Pz{t) 


sin(2f/3) 

P3(0 


sin(i) 

P4(0 


cos(f/3) 

Ps(0 


cos(2£/3) 

k P6(0 - 


„ cos(f) i 


This gives the result 


(14.51) 


' 0.3509 ' 


-6.0424 

-22.3855 

17.0548' 

-0.3130 


-1.3142 

-5.4896 

3.9935 

1.3795 


-6.6571 

-61.8896 

43.9386 

10.0171 


-1.0062 

-7.3220 

8.5557 

13.3350 


-1.6966 

-13.3086 

13.8829 

. 13.3350 . 


. -1.6966 

-13.3086 

13.8829 . 


r ai " 

fen 
< feoi > 


(14.52) 


and we obtain the least-squares estimate 


[ ai &n feoi ) “ (^ ^ 



Equation error methods 

Prediction error methods are for several reasons popular in the domain of 
discrete-time identification, but they are somewhat difficult to apply to contin¬ 
uous-time systems. One approach in formulating such methods in the context 
of nonlinear system identification is the following. 

Define the error 

e{t, 6) = x - f(x) - g(x)u(t) 
and introduce the error functional 

J{6) = \ J* e T {t,e)e{t,d)dt 


(14.53) 
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Time [s] Frequency fs] 


Figure 14.9 Input and output to a system described by the equation x = -a,x - 
b\\xu + boiu. 

with the gradient 

V e J( 6 ) = [ e T (t, 6 )V g e(t, 6 )dt (14.54) 

Jo 

The equation error estimate 6 is taken as the solution to the equation V g J (6) = 
0, i.e., where the gradient of the loss function is zero. 

Example 14.6—Identification by means of equation error estimation 
Consider the equation 

x — —a\X — bi\xu + boiu (14.55) 

r V 

ai 

e = x + aix + bnxu - boiu = x + £ x xu -u j 6n (14.56) 

. 6oi > 

f " 

W 6 e = xu 


—u 


(14.57) 
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0 = VgJ 


= f 

Jo 


"Vgedt = I xVgedt + ( [ V g e(V e e) T dt)6 (14.58) 

Jo Jo 



(14.59) 


it is thus straightforward to estimate 8 by means of linear estimation meth¬ 
ods (see Fig. 14.9). A most restrictive requirement is, of course, that u is 
differentiable or that x is measurable. B 


Operator transformation methods 

The method presented in Chapter 12 can be modified to a class of nonlinear 
systems, and we show this by means of an example. 

Example 14.7—Identification of a nonlinear system 

Assume that the objective is to find estimates based on observations of u,y of 

the unknown parameters a, b in the differential equation 

y(t) = ay(t) + by 2 (t) + u(t), 8 = £ “ j = (14.60) 

A parametric model needs to be developed. A Laplace transformation gives 

sY(s) = aY(s) + bL{y 2 (t)} + U(s) (14.61) 

with the Laplace variable s. Initial conditions have been ignored. An operator 
translation Eq. (12.8) with inverse Laplace transformation then gives the 
parametric model 

- [Ay(0]) = a[Xy(t)] + 6[A(y 2 (f))] + [*«(<)] (14.62) 

Implementation of the low pass filtered signals .1 y, A(y 2 ), and Xu gives the 
estimation model 

^(y - Xy) - Xu = [Xy Xy 2 ) j , 8 = J (14.63) 

Sampling of the filtered signals of both hand sides of Eq. (14.63) provides data 
for identification, and a standard least-squares solution with 8 = (-1.000 — 
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Time [s] 




Figure 14.10 Simulation of input u y output y together with estimation of the coef¬ 
ficients a = 1, 6 = 2 of Eq. (14.60). The input has been chosen to demonstrate the 
nonlinear characteristics. All graphs versus time [s]. 


2.000 ) T gives good accuracy (see Fig. 14.9). Real-time identification based 
on Eq. (14.63) is equally relevant, and sampling of the terms in Eq. (14.63) 
with subsequent recursive least-squares identification of 6 based on the data 
in Fig. 14.10 gives the result according to Fig. 14.11. It is seen that the 
method provides good accuracy and convergence rates. The estimation model 
is similar to that obtained with a heuristic filtering of data appearing at both 
hand sides of Eq. (14.63). ■ 


Example 14.8—Identification of a physical parameter in a robot 

Adaptive control of' robot manipulator motion attracts a considerable interest 
in the robotics literature. The methods presented for identification are very 
relevant to the solution of such problems, as will be described in this section. 

We consider the two-link example in Fig. 14.12 with point masses m i, m 2 
[kg], lengths l Xj l 2 [m], angles q x , q 2 [rad], and r as the vector of joint torques 
Ti, t 2 [Nm]. The end-effector load m 2 is assumed to vary over a certain range. 
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20 30 


Time [s] 

Estimates of a and b 
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Time Is] 




Time [s] 
Prediction error 


Time [s] 


20 30 



20 30 


Figure 14.11 Input u and output y of Eq. (14.63) with the estimated parameters 
a, b by means of recursive identification. 


The equations of rigid-body mechanics are: 


M(q)q + C(q,q)q + G(q) = t; 6 = m 2 


(14.64) 


with the inertia matrix 


.( <' 

l m 2 li 


(mi + m 2 )l\ mzlihicicz + Sis 2 ) 
hh{cic 2 + sis 2 ) m 2 l\ 


(14.65) 


where c 2 is a short notation for cos (< 72 )- The Coriolis and centripetal torques 
are described by the matrix 


c (q,q)q = rn 2 l x l 2 (cis 2 - sic 2 ) 


f -q\ + <7192 | 
l q\ ~ <7i<72 J 


(14.66) 


and the gravitation by 


G(q) = g I 


( -(mi + m 2 )l\Si 


( -m 2 l 2 s 2 


(14.67) 
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These equations are nonlinear in the angular positions < 71,92 with polynomi¬ 
als in the derivatives of < 71 , 90 . The equations exhibit complicated nonlinear 
behavior also without any changes in the mass m 2 . 

r T m I.t 1 ’f T . T dM (q) ■ j , 

I q T M(q)qdt = [~q T M{q)q] 0 ~ 2 J Q q di ^ 

J 0 

= [~q T M(q)q)l - m 2 / hh(ciS 2 ~ SiC2)(-9i<72 + Q 2 )dt 

2 Jo 

qft ^ J 1 

f q r C(q,q)dt = m 2 / 2lil 2 {cis 2 - SiC 2 )(-<7i92 + q\cn)dl 
Jo Jo 

r T r T f T 

/ q T G(q)dt--mi / ghsiqidt - m 2 / gl\S\q\ + ghsikidt 

Jo Jo Jo 

fT fT 

/ qfrdf = q[ri + q\x 2 dt 

Jo Jo 


(14.68) 
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Figure 14.13 Data from robot model Eq. 
Qu < 72 - The estimated variable m 2 and the 
graphs versus time. 
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(14.64) with forces T\, and positions 
regressors and fa are also shown. All 


Assume that all terms with mi and m 2 as factors are collected in the two 
terms m^i and respectively. Then we can summarize Eq. (14.68) and 

Eq. (14.64) to the linear relationship 


[ q T rdt = ra\(p\{T) + m, 2 <p 2 {T) 

Jo 


and we can solve for the unknown m 2 by means of 

rT 


a * (T) = 


(14.69) 


(14.70) 


Implementation of the above integral is feasible provided that q and q and the 
external forces r are available for measurement. The unknown parameter m 2 
may thus be estimated by linear estimation methods (see Fig. 14.13). E 
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14.5 DISCUSSION AND CONCLUSIONS 

Hybrid identification involves discrete-time computation although the identi¬ 
fication object is formulated as a nonlinear continuous-time model. The dis¬ 
cretization usually involves some approximation of the modeling for reasons 
of computation. Often used trade-off principles are step response equivalence 
for zero-order hold inputs and bilinear or Euler transformations. The iden¬ 
tification objects remain linear in the parameters for the bilinear and Eu¬ 
ler transformations, but frequency domain properties may be distorted. The 
method proposed in this paper is not explicitly affected by such choices of 
discretization. 

As it is sometimes difficult to find transfer functions separated in the vari¬ 
ables, it turns out that the frequency domain interpretation of methods such 
as multidimensional fast Fourier transform is difficult to make. This fact 
might also complicate the use of identification results for control. 

The initial conditions of the filters may have a harmful effect on the iden¬ 
tification result. The simplest remedy is to start the filter well before the 
collecting of data so that transients disappear. A simple rule of thumb is to 
wait the time of 3r where x is a filter time constant associated with X of Eq. 
(12.8). The initial conditions of the identification object may also introduce 
some problems when the influence of initial conditions does not vanish with 
time. 

Several of the methods presented are intuitively very reasonable and appear 
in most cases as extensions of linear estimation methods. A drawback of sev¬ 
eral methods is that it is assumed that the state vector is available to mea¬ 
surement. It is in some cases possible to replace nonmeasurable variables by 
bandwidth limited reconstructions. This is the case for the method of Exam¬ 
ple 14.6 where the reconstruction bandwidth is determined by the filter time 
constant r. The proposed method is in some respects similar to the Laguerre 
networks, but there is no direct term that complicates the implementation. 
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14.7 EXERCISES 

14.1 Consider the following nonlinear system with the unknown parameters 
a, b to be estimated from observations of u, y. 

y{t) + ay(t) ■ y(t) + by(t) = u(t) ( 14 - 71 ) 

This is a Lienard-type equation that typically arises when modeling 
mechanical systems with variable friction and damping. The Lienard 
equation often exhibits limit cycle behavior that depends on the initial 
conditions (Birkhoff and Rota, 1974). 

a. Make a parametric model that allows estimation of the parameters 
a and b from observations of y and y. 

b. The initial conditions may have a strong influence on the system 

trajectories and partly determine the limit cycle behavior. Make a 
modification of the model so as to compensate for the harmful effect 
of initial conditions on the estimates. • 



Reference model 



Adaptive Systems 


15.1 INTRODUCTION 

The ability to adjust behavior with respect to changes in system parameters 
is called adaptation. A typical property is that current information is used to 
improve operating conditions by eliminating uncertainty so that some optimal 
state of the system is approached. 

There are very important aspects of both identification and control in adap¬ 
tive systems. Identification methods in real-time application tend to become 
increasingly important, and such systems are often broadly advertised as 
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adaptive systems or learning systems. Once an adequate, usually paramet¬ 
ric, system representation is available, control actions become meaningful. 
In a wide sense adaptive control may mean modification of the parameters 
of some control mechanism or control actions in various contexts. The great 
application potential of adaptive systems is for autonomous systems in whicn 
external actions and external perturbations cannot be defined in advance and 
whose statistical characteristics cannot be determined experimentally in ad¬ 
vance. As adaptive systems can be viewed as extensions and applications of 
identification methodology, we elaborate some aspects in this chapter. 

Most adaptive systems, which all require adequate parametric representa¬ 
tions, are intended for autonomous operation. In this context there are a 
number of problems. First, it is necessary to determine the model structure. 
Second, it is necessary to determine the relevant parameters for a given struc¬ 
ture. Third, a suitable control algorithm should be chosen bearing in mind 
that the result of identification might be inaccurate. There appear to be many 
difficulties in solving the first and third problem whereas the second problem 
can be successfully approached by recursive identification and related gradi¬ 
ent search methods. A reason for difficulties associated with the first problem 
is the impact of mismodeled or unmodeled dynamics such as time delays, 
nonlinearities, or uncertainty in coefficients. As with all model-fitting tech¬ 
niques one can always try to reduce the error by assuming enough degrees of 
freedom. However, as mathematical modeling involves some idealization of 
physical properties, there is a source of model uncertainty or incompleteness 
which usually cannot be described in probabilistic terms. 

The third problem is related to parameter uncertainty and the two feedback 
loops associated with control and adaptation where the control actions supply 
information to the identification and adaptation procedure which, in turn, 
generate the control action. The interaction between the adaptation algorithm 
and the control algorithm is thus much more complex in feedback control 
operation than for open-loop adaptive systems. 

The scientific evolution in adaptive system theory seems to have at least two 
sources: first, attempts to understand and reproduce biological adaptation 
and, second, attempts to use optimization methods and paiameter estimation 
in real-time application. Research started with a biomedical interest in neuro¬ 
logical adaptation related to behavior, growth, and changes in size of different 
parts and portions of the body and has, eventually, found important techno¬ 
logical application. This historical circumstance has had a clear impact on 
terminology despite the sometimes inadequate or even misleading biological 
interpretation. 
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Figure 15.1 Equalization. 


First we give an example of adaptive techniques applied to inverse modeling 
with important application to communication channels. 

Example 15.1—Equalization 

Consider the following transfer function model for a communication channel 


^ = H(z)u k = j~vjU k (15.1) 

where H{z) represents some distortion of a signal { Uk) transmitted through 
the communication channel and where {yk} represents the received signal (see 
Fig. 15.1). In order to accomplish high-quality communication it is desirable 
to do restoration of the signal { Uk } at the receiver end. Normally there is no 
other communication of { u^} to the receiver end of the channel, but by means 
of well-defined test samples sent through the system it is possible to adapt 
the transfer function H l (z) so that it implements the inverse of H{z) — i.e ., 
to the extent that this is possible. It is noteworthy that the system operates 
in a feedforward manner except for the feedback involved in the adaptation. 

e k = u k - u k = (A(z~ 1 ) - A{z~ l ))y k = <pj8 (15.2) 


where 

( T 

yk- i yk-2 ■■■ yk-n ) 

6= [Si-ai S 2 -a 2 ••• a n - a n ) 


(15.3) 


and the parameter estimation problem can be solved by means of standard 
recursive identification methods (see Chapter 11). H 






364 


Chap. 15 Adaptive systems 



Figure 15.2 Adaptive control. 


Equalization as described above is a problem of inverse modeling with decon- 
volution of H(z) operating on the output { y k ). This type of device has proved 
useful in implementing various deconvolution-type procedures like echo can¬ 
cellation and channel equalization. The problem is a standard problem in the 
branch of adaptive systems known as adaptive signal processing (see Widrow 
and Stearns, 1985.) 


15.2 HEURISTIC CONTROL METHODS 

Several ad hoc solutions to practical problems may involve control systems 
with different characteristics for a range of operating conditions. Centre: 
systems designed in such a context may often involve some mechanism for 
modification of the controller and may be described in a form similar to Fig. 
15.2. In its simplest form of gain scheduling , the “adaptive algorithm may 
consist of look-up tables containing controller parameters for various signal 
ranges and modes of operation. Such methods are of practical importance and 
may be used with or without some updating mechanism for tne table parame- 
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ters. However, in order to distinguish adaptive systems from any system with 
feedback it is desirable that the qualification “adaptive” should be reserved for 
cases where the look-up-table somehow can be modified from data. In turn, 
this requires some form of identification procedure. 


Indirect adaptive control 

A control system coupled to an identification procedure is often called an adap¬ 
tive control system . A more systematic approach involves matching of a system 
description of a control object obtained by means of identification 


A{z 1 )y k = B(z *) u k + v k 


(15.4) 


with a feedback control law 


R{z' l )u k = -S{z- l )u k + T(z~ l )r k (15.5) 


Eliminination of Uk between Eq. (15.4) and Eq. (15.5) yields the closed-loop 
system 


(A( 2 - 1 )E(z- 1 ) + Biz-^Siz-^yk = B{z~ 1 )T{z- 1 )r k + R{z~')v k (15.6) 


Model matching is a well-known method of control system design that involves 
specification of the desired system response Q(z~ 1 )/P(z~ 1 ) in terms of a de¬ 
nominator polynomial P(z~ l ) and a numerator polynomial Q(z~ 1 ). Suitable 
choices for model matching with the polynomials R, S, and T of Eq. (15.5) 
are thus found by matching poles and zeros of Eq. (15.6) with those of Q/P. 
Pole assignment can be accomplished by solving the Diop'hantine equation 


A{z)R(z) + B(z)S(z) = P(z) (15.7) 


The Diophantine equation can be solved by solving the following system of 
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linear equations 


02 Ox 


a nt a n A -1 


0 a. 


0 6i 0 

: b 2 hi 


1 h nt 

ai 0 


Pi - al 


Pn A 
Pn A + l 


0 a n 


0 

0 0 


(15.8) 

where the matrix structure in Eq. (15.8) is known as a Sylvester matrix. The 
adaptive control method presented is known as indirect adaptive control. 

Example 15.2—Indirect adaptive control 
Let the system 

Y ( z ) _ tii~\ _ hi + h (15.9) 


= H(z) = 


z 2 -i- a\z + a 2 


be controlled by means of output feedback control (15.5) designed for pole 
assignment with the denominator polynomial P(z) = z 3 + piz 2 +p& +P3- The 
polynomial T{z~ l ) in Eq. (15.5) can, for instance, simply be chosen such that 
the controlled system has static gain equal to 1— i.e.. 


T(z~ l ) 


= mLiEi 

B( 1) Ell hi 


These choices suggest the controller 


Uk = -riUk-i + s 0 yk + siyk-i + £o r * 


(15.10) 


(15.11) 


Equation (15.8) including a Sylvester matrix is then reduced to the linear 
equation 


ai bi b i 

G2 0 62 


' P 1 - a: ' 

Pi ~ a 2 


(15.12) 


A(0)d = b(Q), with d = s 0 


(15.13) 
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Figure 15.3 Indirect adaptive control starting at time t = 100. The second-order 
process is controlled by an output feedback control law with pole assignment to the 
origin z = 0. The histories of the input u , output y, and the estimated process 
parameters and the controller parameters are displayed. 


A simulation depicting the adaptive control performance is shown in Fig. 15.3. 


The matrix A(6) in Eq. (15.8) can, of course, be singular or close to singular 
when the model order is overestimated and when the real parameters ai, < 22 , 
61 , and 62 are substituted by their estimated counterparts. Reliable imple¬ 
mentations of this type of controller therefore require an equation solver that 
takes care of the situation with rank deficit (see Appendix A). A systematic 
solution is, of course, to use the pseudoinverse A+(d) to solve for the controller 
parameters 

7 } = A + (6)b (15.14) 

It should be borne in mind that a rank deficient equation of the type (15.13) 
has an infinite number of solutions and that the solution (15.14) has the 
smallest 2-norm of all minimizers of ||Atf - b \\2 (see Appendix A). 
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Figure 15.4 Neuron model of the type used in artificial neural networks. 


15.3 ASPECTS ON NEURAL NETWORKS 


McCulloch and Pitts (1943) introduced a neuron model operating like thresh¬ 
old functions that could generate fairly complicated behavior. Rosenblatt gen¬ 
eralized the McCulloch-Pitts model by adding adaptation, calling this model 
a perceptron, which has had an enormous impact on the field. 

A neuron model of the type depicted in Fig. 15.4 consists of an input pattern 
(or input vector) u = (izi, ■ ■ ■ , u m ) T , an output pattern (or output vector) 
y = (y 1 ,y 2 ,...,yn) r . and gain parameters Wij called weights. The input- 


output relationship is 


where 

' 21 ’ 
22 

^ Zn * 


r y i' 


f(z l) ] 



= 

f{z 2 ) 


< y n > 


^ f(Zn) ■ 


Wio 

w 11 

■ Win 1 

WnO 

W n \ 

• * * w 

nm > 


' Uq ' 
U l 

U2 

\ u m ) 


(15.15) 
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Figure 15,5 Artificial neural networks with two neural cell layers interconnected 
according to the perceptron principle where the first layer output yW provides input 
to the second layer. Each square element in the figure represents an artificial neuron 
with many inputs and one output. 

where uq is an offset. The nonlinear function /(•) should be chosen among 
the functions that map real numbers onto an interval of the real numbers. 
Standard choices of /*(•) are 

f(x) = tanhx, or f{x) = —or f(x) = ^ (15.16) 

Widrow designed a device called adaptive linear element or adaline that ad¬ 
justs its gain parameters (or weights) between the input and the output in 
response to the error between the computed and desired outputs. The func¬ 
tion of an adaline device is closely related to that of a perceptron element (see 
Figs. 15.4, 15.5) and can be described as follows: 

Let y r be some desired behavior of the artificial neurons. Each error compo¬ 
nent 6/ of the error e = y - y r is then 

c. = yi -y r i = f(zi ) - y r i (15.17) 

By adopting gradient search techniques similar to those in Appendix C, we 
can approach the minimum of the loss function 

J{e) = ^e T e = ^ - y-) 2 


(15.18) 
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by choosing 


wu = -Yu 


8J 

dwu 


Yij > 0 


(15.19) 


or 


df , 


SO 


w u = -Yij{~-) T ei{t) = -Yunzi)Uj(t)ei(t), where f{zi) = 

^ (15.20) 

that the error e; converges in the course of time according to 


cLJ 


d 9 /,\ v' . r*\/& ei < n (15.21) 




ij 


Much of the effort in the design of artificial neural networks consists of com¬ 
posing large-scale structures of identical elements. Following biological in¬ 
spiration and early perceptron concepts, it is popular to propose multilayer 
networks reminiscent of retinal organization in the human eye (see Fig. 15.5). 
Let v/ k) denote the matrix of parameter weightings in layer k, and let y de¬ 
note the prescribed output behavior. Assuming that there are p layers in 
such a layered network where the output of layer k is connected to the input 
to layer (k + 1 ) for k = 1 , 2 ,... ,p - 1 , the output y {p) of layer p represents the 
network output. A problem that arises in this context is how to organize tne 
adaptation by using the global output performance e™ as a means to control 
the local parameters. In the absence of a “teacher” acting at an intermediate 
local level in the network, a fruitful idea is to calculate local errors e {i.e., 
output errors associated to each layer k), based on global error e p - y y • 
The local errors can, then, be used for parameter adaptation at each neuron. 
This approach can be justified by means of optimal control theory according 
to the following motivations: Consider the performance index 


j (e lP>) = i (e (p))r e (P) = l {e M) T e^ - - /W*^)) (15.22) 

2 2 *=i 

where the Lagrange multipliers {A *}£ =1 have been introduced to “adjoin” the 
constraint equations imposed by the neuron interconnections in a medio, war 
neural network organized in p layers. Let the indeterminate Lagrange mul¬ 
tipliers be chosen according to 




= c (p) 


fe = l,...,p-l, where f'{z {k) ) 


df 


dz 




A* = f'(zW) A* +1 , 


(15.23) 
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which are the adjoint equations of a p —stage optimal control problem. Then 
it is possible to simplify the gradient of J according - to 


dJ 


ow 


(k) 


X; 


ik 


8yW 

dw)^ 


( 15 . 24 ) 


A gradient search for a minimum value of the loss function of the type (15.19) 
satisfying the constraint equation is 


w (k) 

tj 


-Au 


dy\ k) 

K 4) 


Uikf'(z\ k) )y?- l \ k = 2 . p 

l Aikf'(z\ k) )Uj, 6 = 1 


(15.25) 


so that 


dJ 

dt 


lJ 


- E <!€*«>’ S 0 




*' dw U 


(15.26) 


The method defined by Eqs. (15.23-15.25), known as back propagation, is 
one such method of computing local errors from the global errors. In turn, 
the local errors can be used for local adaptation by means of gradient search 
methods (see Fig. 15.4). 


Hence, the back-propagation reformulation enables the transformation of a 
functional optimization problem to a parametric one. The parametric opti¬ 
mization according to gradient search techniques ensures that the adapting 
parameters approach at least some local minimum and is thus much related 
to identification techniques. Other aspects of comparison between neural net¬ 
works and identification theory, however, seem to be underexploited in this 
field, and several such aspects are open research issues. For example, ar¬ 
tificial neural networks are sometimes characterized by an excessive over- 
parametrization as compared to standard practice in identification methodol- 
°? y '. T ^ e Prediction error criterion would suggest that the pre¬ 

diction accuracy of such overparametrized devices is not optimal. 


15.4 EXTREMUM CONTROL 

As shown in previous chapters, it is standard practice to apply gradient search 
methods to the minimization of parameter errors. Another possibility, known 
as extremum control, is simultaneous application of parameter estimation and 
optimization with respect to some control action. This is of particular interest 
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Figure 15.6 Example of extremum control where the estimation result can be used 
to improve a quality index. 

in contexts where control is to be used as a means to improve some quality in¬ 
dex. Extremum control is thus relevant in several industrial processes related 
to energy saving or emission control for environmental protection. Relevant 
examples of such control variables, are the composition of raw materials to 
some industrial process or the fuel-to-air ratio in combustion engines. 

A central problem is to define a suitable quality index or efficiency measure 
V , which is usually assumed to be a quadratic function of control parameters 
u and some unknown weighting matrix P 2 , a vector Pi, and a scalar Pq. 


\U) 


= lu T P* 


T r> , 

* - i + t 0 


Q5.27) 


where the set of control variables u —usually some kind of rate variables 
regulates the quality index V toward an extremum. The quadratic function 
V is a function with its extremum at 


iT = -Pi 1 Pi 


(15.28) 
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and it is necessary to estimate u from observations of the control parameter 
u and the quality measure V(u). A residual model for estimation of P 2 , Pi, 
and P 0 from a control sequence { u k } and observations { V{u k )\ is 

£k = \ulhu k +ulP 1+ P Q -V<u k ) (15.29) 

from which parameter estimates Po(k), P\{k ), P 2 (&) may be obtained by meth¬ 
ods of previous chapters. A Newton-Raphson gradient search for u* based on 
the estimated parameter gives 


Uk = u k _! - ylV-\u k ^,k)VuYiUk.uk) = 

= Uk-i- Pi 1 (kXPzWuk.! +?!&)) = -P^(k)Pi{k) 


(15.30) 


Hence, a solution to the extremum control problem involves estimating P 2 » 
P i, and Po ; computing the extremum u'\ and appljnng this control parameter 
to the system. 

Example 15.3—Extremum control 

Assume that an extremum control problem can be modeled by means of the 
relationship 


^(U*) = 1^ [ J 3 ] «* + B* [ 2 ) +! + «'* (15.31) 

where {«;*} is a zero-mean white-noise sequence with P[w 2 k ) = 1 . The ex¬ 
tremum of Eq. (15.31) with V(u") = 0.3125 is at 


u' = -P 2 l l\ 


' -0.125 ' 
k -0.625 , 


(15.32) 


A simulation is given in Fig. 15.6 where estimation and control starts at t = 
10 , which results in an improvement of the quality index V(\ik). Extremum 
control based on least-squares estimate of 100 samples gives the estimate 


u = -P^ 1 Pi = 


' -0.104 
, -0.716 


(15.33) 


There are a number of problems with this approach. First, it is sometimes 
difficult to assume a purely quadratic relationship between the control pa¬ 
rameter u and the quality measure V{u). Second, the two-step extremum 
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control procedure is numerically sensitive to disturbances and unmodeled dy¬ 
namics as it involves a two-step optimization where errors in the first step 
have impact on the second stage. Third, the input-output relationship is a.s- 
sumed to be purely static, and there are few systematic methods to modify 
the algorithm when system dynamics cannot be neglected. 

Extensions to solve these problems require more elaborate nonlinear models 
such as Hammerstein models including an input nonlinearity 

A(z _1 )t>* =Po + u T k Pi + \u T k P 2 u k ( 15 - 34 ) 


or Wiener models with output nonlinearity 


Vk 


= Pq + yl p i + \yl p 2yk = 


Pc + i>J>f + \ 


(15.35) 


i = l 


i = lj = l 


where y k = E"=i h i u k-i is some intermediate signal that represents a filtered 
input { u k ]; see Chapter 14. 

The estimation accuracy tends to increase at a very slow rate after a rapid 
initial improvement. Therefore, a successful application is contingent upon 
a careful implementation of both estimation and control calculations which 
involve matrix inversions. Active test signals such as pulse sequences or si- 
nusoidals are sometimes proposed in order to improve the adaptation rate 
although such signals imply suboptimal control, which is applied at a sigmfi- 
cant cost. 


15.5 MODEL-REFERENCE ADAPTIVE CONTROL 

It is standard to define an adaptive control system as a control system cou¬ 
pled to an identification procedure. A certain sceptic attitude towards adaptive 
control is sometimes perceptible with a criticism concerning its unpredictable 
control actions and stability problems. One reason why adaptive control is 
more difficult than other problems in adaptive system theory is that two feed¬ 
back loops interact in a difficult manner— i.e., the feedback control loop an 
the adaptation. In addition, formulating a control performance that is appro¬ 
priate for an adaptive control system is not a trivial feat. 
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A way to simplify the formulation of an adaptive control problem is to request 
that the controlled system behave similar to some given, well-defined system 
called reference model representing some prescribed system behavior. In the 
terminology of neural networks this reference model serves as a teacher that 
trams the system to have properties similar to its own. Inverse models ob¬ 
tained by feedforward and feedback control are for this reason relevant, and 
several adaptive control algorithms provide systematic methods of identifying 
a system inverse so that desired outputs can be transformed into the inputs 
that generate them. Once an adequate prescribed behavior has been speci¬ 
fied, it is straightforward to evaluate performance as well as various design 
options. One important approach to adaptive control design is the minimiza¬ 
tion of variance-like or energy-like functions of the system state and output. 
The need for a rigorous analysis suggests the use of Lyapunov theoiy and we 
refer to some background material in Appendix 15.1. 

Assume that the control object can be described by the state equation 


x = Ax + Bu 
and that 


( &1 —CL2 


0(n-l)xl 



j u (15.36) 


u = -° Tx (15.37) 

is an appropriate control law of suitable structure. In the case of a known 
A it is possible to choose a suitable 6 by means of model matching so that 
A - B6 = A m for some dynamics matrix A m representing the prescribed 
system behavior. This gives the closed-loop system 


x = Ax + Bu 


-Oi - e x 


— 0,2 — 62 


a n ~ e n i 
0(n—1)*1 J ^ 


Let (15.37) be replaced by the adaptive control law 


A m x (15.38) 


8 = S~ 1 xB t Px, S =S T > 0 
u = -e T x 

where P solves the Lyapunov equation (see Appendix 15.1) 

PA m + A T m P = -Q 

The system behavior under adaptive feedback control is 


(15.39) 


(15.40) 


x = (A- B6 t )x - Bx T 6 = A m x - Bx t 6 


(15.41) 
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In order to prove that the adaptive system defined by Eqs. (15.39-15.41) is 
stable, we consider the following Lyapunov function candidate (see Sec. 15.8) 

v(x,e) = \x T Px + i e T se , s = s T > o (15.42) 


with the derivative 


dV 

dt 


\x T Px + \x T Px + \e se + i e T se 
lx T {A T m p + PA m )x - x T PBx T e + e T se 

Z 


(15.43) 


If 6 is constant then 6-6 and 

= ~x t Qx < 0, ||*|| ^ 0 (15.44) 

dt 2 

which shows that the system is globally stable in the sense of Lyapunov 
i.e.. a certain weighted sum of squared control errors and parameter errors is 
guaranteed to decrease in the course of time. B 

Example 15.4—Adaptive control of a linear system 

Consider adaptive stabilization of the system 

' -ai —a 2 -a 3 ) ( *i 'i (b i ' 

1 0 0 x 2 + 0 u (15.45) 

,0 1 0 J l x 3 ] 0 , 

so that it behaves like the model 

' -3 -3 -1 w*r 

= 10 0 x 2 = A m x (15.46) 

„ o 1 0 J l x 3 , 

Application of control algorithm (15.39) for Q = S = I 3x3 and P solving the 

Lyapunov equation PA m + A m P = —Q gives 

' 0.4375 0.8125 0.5000 ' 

P = 0.8125 3.2500 1.9375 (15.47) 

. 0.5000 1.9375 2.3125 ; 




A simulation of this adaptive algorithm for oi = a 2 = a 3 = -1 and b\ = 1 is 
shown in Fig. 15.7 in which typical transients of control and adaptation are 
exhibited. ® 
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Figure 15.7 Model reference adaptive control of Example 15.4. 


Example 15.5 Adaptive control of robot manipulators 

The adaptive state feedback methods presented can be extended to certain 
classes of nonlinear systems. Consider the rigid-body motion of robot manip¬ 
ulators as elaborated in Example 7.3 with nonlinear motion equations 


M (q)q + C(q, q)q + G(q) = r (15.48) 

expressed in some generalized coordinates q e R n . Assume that the ma¬ 
trices M , C, G have a known structure but that some parameters might be 
unknown. An adaptive control objective to follow some given trajectory q r can 
be approached by means of energy-Jike Lyapunov function such as 


V(x,e,t) = ^T { 


M(q) 

0 


0 

K 


t 0 x + \e T se , 



(15.49) 

Let Q = Q > 0, R = R T > 0 be some weighting matrices. It can be shown 
that for certain choices of Q, R it is possible to generate stable adaptive control 
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for K = K T > 0, Tq solving the algebraic matrix equation 



(° 

K ' 

+ Q- 

t£br- 1 b t t 0 

= 0, 



0 , 



where 

T„= ( 

T 11 

0 

7 '12 ] 
T22 J 

, and B = | 

I nxn 1 
0 /ixn J 


(15.50) 

(15.51) 


Let the control law generated by means of V be expressed in terms of unknown 
parameters 6 e R p of M, C, G and the data matrices <p e R e R • 

The vector <po contain terms of r* that can be computed without reference to 
unknown or uncertain parameters. In the special case of a diagonal submatrix 
Tn we have 


r* = M(q)(q r — T^Tuq) + C(q,q)(q — T^B 7 Tqx) 
+ G(q) -R- 1 B t T 0 x = <t>e + <po 


The following adaptation law 

6 = -S~^ t B t Tox (15.53) 


assures for constant parameters 0 that the derivative 

V = —x 7 (Q + Tq BR~ 1 B t To)x < 0 (15.54) 

Thus the system is shown to be globally stable in the sense of Lyapunov, 
and the adaptation eventually optimizes the control system. Simulations of 
these control principles applied to the robot in Example 7.3 with adaptation 
with respect to the work load m 2 give results according to Fig. 15.8. This 
simulation describes a weight-lifting operation from an initial position with 
all robot arm segments resting on ground, i.e. < 7 i( 0 ) = < 72 ( 0 ) = ^/2, and the 
prescribed final position <71(00)' = <72(00) = 0. ■ 


Self-tuning minimum variance control 

Model-reference adaptive systems are characterized by their gradient search 
for a set of suitable control parameters. These ideas can be applied also to 
discrete-time systems, and such direct adaptive control methods can be for¬ 
mulated as alternatives to the indirect adaptive control presented previously. 
Hence, consider a discrete-time system modelled as 

f A{z~ l )y k = boz- l B(z- l )u k + C{z~ l )w k , Control object 
J (15.55) 

[Riz-^Uk = -Siz-^yk, Regulator 
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0 10 20 0 10 20 

Figure 15.8 Simulation of a robot subject to the self-optimizing adaptive control 
law. Upper graphs show qi, q 2i and middle graphs T\ and T 2 . The lower left graph 
shows the estimate ^ of m 2 , and the lower right graph the Lyapunov function that 
decreases everywhere. All graphs versus time [s]. 


from the input { Uk) and the zero-mean normally distributed white-noise proc¬ 
ess {wk} to the output {y*} with co-prime polynomials 

A(z ) = 1 + a\Z + • * * + a n z n 

Biz’ 1 ) = i + b 1 z~ 1 + ■ • • + 6 n _iz -n+1 (15.56) 

C(z“ 1 ) = 1 + ciz -1 + + c n z~ n 

and where R and S are suitable polynomials in the backward shift operator. 
The B - and C —polynomials are assumed to have no zeros fcr \z\ > 1 and the 
parameter b 0 is a gain factor. The co-primeness of A , B, and C assures that 
the control object model (15.55) also corresponds to a state-space realization 
of order n and also the fractional form (see Fig. 15.9) 
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Figure 15.S Block diagram of the self-tuning regulator with a noise model. Notice 
that a correctly tuned minimum variance regulator totally decouples £ from noise 
interference of w. 

with the noise components ek and Vk for polynomials F and G satisfying the 
polynomial Diophantine equation 

AF+z~ 1 G = C (15.58) 


The use of pole assignment as a design principle suggests that R and S are 
chosen so that the denominator polynomial RA + boSB of closed-loop system 
matches some prescribed polynomial P, cf. Eq. (15.7). Inverse modeling of 
the output to achieve the pole assignment RA + boSB = P can be made by 
means of the following expansion based on Eq. (15.57) (argument z _1 omitted) 


y*+i = b 0 B(z + ek+\ = boB 


RA + Sboz-'B 


boB, 


(R(A£k) + S(oqz + Bwk+i 


,b 0 B 


b 0 B 


R(-jr~ u k) + S(—p-yk) + 


RG-boBSF 


£,k + Fw k+i 


Wk + Fw k+ 1 


(15.59) 


This expression is in linear regression form with respect to the feedback pa.- 
rameters of R, S, and the regressors consist of filtered input-output data {u*} 
and {y*}. The corresponding noise model is particularly simple so that the 
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P = BC 

R = b 0 BF = b 0 B (15.60) 

S = G = (ci - aj) + • • • + (c„ - a n )z~ n+1 

which is the appropriate minimum variance regulator to use in the case of 
known parameters. Assuming that the input {u*} and the output {y*} are 
available to measurement, introduce the parameter vector 6 and the regressor 
vector { <j>k } as 


<Pk = ( 


r i . 

Uk-l 


r n -1 S 0 ... S n ^ 

- u k - n +1 yk ... 


yk 


■n + l j 


(15.61) 


Direct minimum variance adaptive control algorithm ad modum Astrom-Wit- 
tenmark (1973) then comprises the following steps. 


8k = 8k -1 + Pk<Pk-\£k 

r> r> Pk-l<Pk-l$k-iPk-l 

^ k - ^k-\ — - —f — --- 

1 + (Pk^Pk-itk-i (15.62) 

£ k = yk - Puk-i - Qk-i<Pk -i 



where the vector of estimated parameters 8 has replaced the parameters 8 of 
the correct desired control law, whereas J3q is a fixed a priori estimate of the 
gain factor 6 0 . Convergence analysis of 8 and the output error can be made 
by means of the function 

Vk = - log p(6k\fk) + log(l + \ x{Qx k ) 

a 1 ~ i i ( 15 ’ 63 ) 

= 77 log 2k + ^ log det Pk + - e T k P~ k l e k + ~ log( 1 + ^x'iQxk) 

The first term of the function Vk consists of the log-likelihood function for the 
parameter error 8k and the second term is the logarithm of a signal-to-noise 
ratio with 

= f ^k-l £k-2 ^>k-2n + y 


(15.64) 
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and a 2 = In information theory, the mathematical expectation of V k 

is known as the system entropy , which often is interpreted as a measure of 
disorder of the system. It can be shown that the entropy function V* develops 
as a supermartingale—i.e., 


V* a.s. 


(15.65) 


c5=J)-| >0 ’ and 0< J <2 (1566) 

As Vk is not positive definite with respect to P*, it does not qualify as a Lya¬ 
punov function. However, the interpretation of Vk as a log-likelihood function 
admits the conclusion that the parameters converge toward the minimum- 
variance parameters and that the state vector x* decreases in magnitude so 
that ||**|| -» 0. 

Example 15.6—Self-tuning minimum variance control 

Consider the following ARMAX model with input {u*}, output {y*}, and the 

zero-mean white-noise sequence {«;*}. 


yk +1 = -ayk + Uk + Wk+i + cw k 


(15.67) 


which can be expressed as the following state-space model 


x k+ i = -ax k +u k + (c - a)w k 
y k = x k + iv k 


(15.68) 


Minimum variance adaptive control applied to this system cam be done by 
means of the algorithm 


<Pk = yk 
£k = yk 

Pk<t>k<t>kPk 

Pk* i Pk l + <PkPk<Pk 
&k+i = @k + Pk<Pk£k 
u-k = ~&k<Pk 


(15.69) 


The result of a simulation is shown in Fig. 15.10, which shows parameter 
convergence toward the minimum variance control parameter 6 = 1.6 and 
convergence of I**! toward zero. * 
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Figure 15.10 Minimum variance adaptive control starting at time t = 200 for a 
first-order ARMAX-model y* - 0.9y*_i + Uk-i + Wk + 0.7u;*_i. Notice that the state 
{**) = (y*) - ( Wk] eventually becomes noise-free as a result of the adaptive control. 
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Stability and robustness 

Practical application of adaptive system theory requires for reliability reasons 
some kind of stability proof. Another important aspect is robustness, which, 
roughly speaking, means the ability to tolerate “small” deviations from the 
model class assumed during the algorithm design. In fact, engineering ap¬ 
plication of adaptive techniques has been hampered by severe criticism con¬ 
cerning its properties of closed-loop stability and robustness. One aspect of 
stability that has attracted much attention is that adaptive state feedback 
control of the type presented in Eq. (15.39) requires measurement of the 
full state vector for its implementation. A natural question concerns whether 
measurement of some output variable might suffice. Unfortunately, there is a 
restrictive answer to this question given in the following theorem and exam¬ 
ple. 

Theorem 15.1—Yakub ovich-Kalman 

Let the pair (A, 6) be controllable. Then there exist positive definite matricea 
Q and P, a vector w, and a scalar y solving the equations 

Q + PA + A t P = ww T 

-c + Pb = yw (15.70) 


if and only if the transfer function 


G(s) = y + w T (sI — A)~ l b 

(15.71) 

is such that 


Re G(ico) >0, co g R 

(15.72) 


■ 


Example 15.7—Adaptive control by means of output feedback 
Assume that the complete state vector x is not available to measurement but 
only some output vector y = c T x. 

x = Ax + Bu - (A - B6 t c t )x - Bx T c9 (15.73) 

Consider the Lyapunov function candidate 

V(x,0) = \x T Px + £ e T se 
2 2 


(15.74) 
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Q + PA m + A%P = 0 

Pb = c (15.75) 

y - 0 


for Q = Q T > 0 and Pb - c as obtained from the Yakubovich-Kalman lemma. 
A Lyapunov function similar to Eq. (15.42) has the time derivative 


~ i x t (aIp + PA m )x + x t pbx t c9 + e T se 

* -l xTQx 


(15.76) 


for the algorithm 

6 = S~ x yy T = S~ 1 yB T Px 
u = -d 7 y 


(15.77) 


This example demonstrates that the Yakubovich-Kalman is a restrictive the¬ 
orem for the application of output feedback adaptive control as it is required 
that the Y(s)/U(s) = G(s) = n T (sI - A)~ l b is such that G(ico) > 0 for all 
a e R — i.e., where no other linear combination of x than y = c T x is available 
to measurement. ■ 


As yet there are few effective and systematic means to design adaptive control 
for systems that do not fulfill the “positive real” condition of the Yakubovich- 
Kalman theorem. In practice, this means that full state information is re¬ 
quired in order to apply continuous-time adaptive feedback control. 

Another question is whether the assumptions made on model order are re¬ 
strictive for adaptive control methods. Unfortunately, there is an affirmative 
answer also to this question, which is elaborated in the following example 
adapted from Rohrsf and co-workers. 

Example 15.8—Stability problem of adaptive control 

Assume that the control object is adequately described by the transfer function 


Y(s) = 


2 

s + 1 


229 

s 2 + 30s + 229 


U(s) 


(15.78) 


t This example is based, with permission, on the paper of C.E. Rohrs, L. Valavani, M. Athans, 
and G. Stein. “Robustness of adaptive control algorithms in the presence of unmodeled 
dynamics.” Proc. IEEE Conf. Decision and Control, Orlando, Florida, 1982.©1982 IEEE. 



386 


Chap. 15 Adaptive systems 


Reference model 



Figure 15.11 An adaptive Rohrs* system exhibiting unstable behavior in response 
to a sinusoidal reference signal. 

whose transient responses are in good agreement with the behavior of a sim¬ 
plified model 

= (15 - 79 > 

Furthermore, assume that a suitable reference model would be 

r r (s) = -4qi?(s) (15.80) 

s + 6 

A simple proportional control law based on Eq. (15.79) that matches the 
reference model Eq. (15.80) is 

u{t) = —0 y y(t) + 6 r r{t) - -y{t) + 1.5r(£) (15.81) 

whereas it gives the following closed-loop system when applied to the original 
transfer function Eq. (15.78) 


Y(s) _ 687 

R{s) ~ s 3 + 31s 2 + 259s + 687 


(15.82) 


The reference signal was chosen as 


r(t) = 0.3 + 1.85 sin c0 o t , 


a>o = 16.1 [rad/s] 


(15.83) 
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Figure 15.12 A Rohrs’ system which exhibits divergent adaptive control. 


with o) 0 being the frequency of 180° phase lag of the closed-loop system Eq. 
(15.82). A simple adaptive control system (see Fig. 15.11) based on gradient 
search and designed for a first-order control object uses the following algorithm 

e(i) = y(t) - y r {t) control error 

(p{t) = ^ r(t) y(t) j regression vector 

6(t) = ^ 6 r 6 y j parameter vector 

6 - r<j){t)e{t) adaptation 

u = 0 T <p control law 


It is, unfortunately, obvious from the simulation in Fig. 15.12 that the system 
has a divergent response to the reference signal. This behavior j.s reproducible 
even when starting at the correct initial parameter estimates 9 r (0) = 1.5 and 
6 y = 1 . ■ 

The example shows that heuristic model reduction in the context of adaptive 
control might produce very poor results. It also shows that gradient optimiza- 
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tion methods are susceptible to correlated disturbances. This example is, of 
course, relevant also for neural network theory, a fact which is sometimes 
overlooked. 


15.6 MULTIVARIABLE DIRECT ADAPTIVE CONTROL 


The formulation of multi-input, multi-output adaptive control gives rise to 
some specific mathematical problems. One issue is the system representation 
problems associated with cross-coupling and system zeros of multivariable 
systems. A related problem is the parametrization problem. It is, for instance, 
not trivial to extend notions of gain from single-input, single-output systems 
to multi-input, multi-output systems. 

Consider a time-invariant, finite-dimensional, linear and causal control object 
with input {u*} and output { y k ] and the system equation 


A(z l )£ k = u k 

y k = B(z~ 1 )^k 


with 


yk*R n 

Uk g R n , and 

ZkzR n 


f rank A(z *) = n 
\ rank B(z -1 ) = n 


(15.84) 

It is assumed that A and B are right co-prime polynomial matrices, i.e., A and 
B have no common right factors except for unimodular matrices, and that A 
and B have no common zeros for \z\ > 0. The transfer operator Hq(z) = 
B{z~ l )A~ l {z~ l ) from u k to y k is assumed to be strictly proper with respect 
to the forward shift operator z and it is assumed that the control object is 
stabilizable from the control input u k . 


Consider a linear, causal, and finite-dimensional regulator of the type 


R u (z l )u k = -S y (z l )y k + T(z l )r k (15.85) 


which generates a control signal Uk by means of feedback of yk and a reference 
signal rk . The closed-loop system obtained from Eq. (15.84) and Eq. (15.85) 
has the transfer operator relationship 

y k = Biz- 1 ) (l? u (z- 1 )A(z- 1 ) + S^(z- 1 )B(z- 1 ))“ 1 T(z- 1 )r, (15.86) 

which should be designed to be stable and able to follow the reference sig¬ 
nal r k . The stability requirement implies also internal stability which, in 
turn, implies that the closed-loop system must not have any uncontrollable 
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or unobservable unstable mode. It is therefore necessary that the polynomial 
matrices i? u (z -1 ) and S^z -1 ) have no common zeros in the unstable region of 
the z-plane, i.e., for |z| >1. 

Consider the closed-loop system in Eq. (15.86). It is clear already from the 
single-input single-output system that attempts to cancel the zeros for |z| > 0 
of the control object will result in unstable modes. An admissible regulator 
will therefore result in a closed-loop system of the type 

y* = B L s(z~ l )T R {z- l )r k (15.87) 

where Bls(z _1 ) —the left structure matrix — represents the part of the 2J(z -1 )- 
matrix which is noninvertible from the right by a control law (15.85) and thus 
invariant; see (Pemebo, 1981). Choice of the matrix Tr(z~ 1 ), however, is not 
further restricted by such stability considerations as long as it is chosen as a 
stable and causal rational matrix. 

The zeros for |z| > 0 of a system may be described by the Smith form (see 
Appendix A) 

Biz- 1 ) = U (z -1 )E(z _1 ) V (z -1 ) (15.88) 

Here the E-matrix contains the invariant polynomials on the diagonal with all 
nondiagonal elements of E(z -1 ) being zero whereas the unimodular polynomial 
matrices U and V have stable and causal inverses. The left structure matrix 
Bls (z _1 ) can thus be represented by the product of the polynomial matrices 
U (z -1 ) and E(z _1 ). The inclusion of £/(z -1 ) in the left structure matrix derives 
from the fact that the regulator identities in (15.85) only operate from the 
right. The left structure matrix £ls(z _ 1 ) thus contains information about 
the cross-couplings and the system zeros for |z| > 1. Therefore, any reference 
model to be followed perfectly must contain some representation of this system 
invariant. Therefore, let the class of reference models be restricted to models 
of the form 

A Af (z- 1 )r* = Ba/(z _1 )u C4 (15 89^ 

y? = B^iz-^TRiz- 1 )^ ' ' 

where u Ci is some command signal that generates a filtered reference signal r* 
and vdiere yf is the model output to be tracked perfectly. The compensator 
7r(z _1 ) can be chosen without restrictions as long as it is stable and the 
matrices Aa/(z _1 ) and Bm (z _1 ) should be left co-prime polynomial matrices 
for |z| > 1. The factorization between and Bm(z _ 1 ) is usually chosen 

such that Aj»/(0) = / nxn . 

As prior knowledge of cross-couplings and zeros becomes unrealistic to assume 
in the context of adaptive control, it appears necessary to formulate a method 
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that allows Bi${z~ l ) to be incorporated without actually knowing it. This 
problem can be solved by certain choices of the compensator Tr{z~ 1 ) so that 
Bls(z~ v )Tr(z~ 1 ) becomes less complicated than B^s{z~ l ) itself. The dynamic 
decoupling problem would then include the problem of finding a stable Tr (a -1 ) 
such that 

Bls{z- 1 )Tr{z~ 1 ) = B D (z~ l ) with B d ( 1) = I/ixn (15.90) 

for some feasible, polynomial matrix Bo(z _1 ). In order to fulfill the desired 
input-output relationship and to eliminate cross-couplings it would be suitable 
to choose Tr{z~ 1 ) in such a way that J3.d(z -1 ) becomes a diagonal matrix. A 
problem formulation in terms of model reference adaptive control would then 
include the problem of finding Tr{z~ 1 ) adaptively so that Bls need not be 
known. 

Lemma 15.1 

A square nxn strictly proper transfer operator Hq(z) of full rank and with 
deti?o(l) 0 may be decomposed into a right co-prime factorization (A(z -1 ), 
B(z -1 )) for |z| > 1, i.e., rank (a t (z- 1 ),B t (z~ 1 )'J for |z| > 1. The factorization 

is such that A(z _1 ) contains all the poles of H 0 (z ) and B(z~ 1 ) contains all 
zeros of Hq(z) for jz| > 0 and 

H 0 (z) = B(z~ 1 )A~ 1 (z~ 1 ) = BUz-^Bsiz-^BrXz-^A-Hz- 1 ) (15.91) 

where A is a square, full-rank polynomial matrix and Bl, Br are polynomial 
matrices with stable and causal inverses. The diagonal polynomial matrix B$ 
is satisfying the additional requirements that A(0) = I nxn , Bs( 1) = / n xn> and 
that Br(0) = I nxn + Bo is upper right triangular and invertible. The diagonal 
polynomial matrix Bs(z~ 1 ) has entries which contain no polynomial factors 
except for the zeros at infinity and the finite zeros for |z| > 1 of Hq(z). 

Proof: See Johansson (1987). ■ 

In control design based on pole assignment it is customary to use a polynomial 
factor Ao(z _1 ) with the interpretation of an observer polynomial. An important 
property of such a factor is that it can be used in order to modify the control 
behavior although it does not appear explicitly in the closed-loop transfer 
function between reference r* and output u*. Introduce for this reason a 
polynomial matrix Ao(z -1 ) with no poles or zeros for |z| > 0 and with a stable 
and causal inverse. Pole-assignment design can now be approached by means 
of the following theorem: 

Theorem 15.2 

The transfer operator of the closed-loop system (15.86) and the reference 
model are identical if R u , S y , and T are chosen from the'polynomial matrix 
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solutions to the following polynomial equations for given diagonal matrices 
Bs and Bq. 

T^AqAmBlBs = B s A q A m 

R U A + S y B = AqA m Br (15.92) 

B s T = T^AqAmBd 

Proof: For proof of existence of these solutions refer to (Johansson 1987). 
The resulting closed-loop transfer operator from reference r* to output y* is 
then 

H = B(R U A + S y B)~ l T = B l B s B r {AoA m B r )- 1 T 

= (T 2 A 0 A m )- 1 B s (AqA m )(A 0 A m )- x T = (T 2 A 0 A M )- l BsT (15.93) 

= {T 2 A 0 A M )~ l (T 2 AqAm)B d = Bd 

which reproduces the requested input-output behavior in Eq. (15.90) for the 
choice Tr = (AqAm)~ 1 T. The calculations involved in Eq. (15.93) are possible 
as Ao, Am, and T 2 are stable and causal polynomial matrices with stable and 

causal inverses. ■ 

An estimation modei 

Consider the controlled outputs (argument z omitted) 

y* = B{A 0 A m B r )-\R u A + S y B)t; k 

= B L Bs(A 0 A M )-\R u A + SyB)Z k ( ' ' 

Application of the equations of Lemma 15.1 gives the relationship 

T2AoAMyk = Bs(R u Uk + Syyk ) (15.95) 

Now define (argument z~ 1 omitted) 

e k = A 0 A M e k = A 0 A M (y k - yjj") = A 0 A M (y k - B D r k ) (15.96) 

Substitution of these relationships into Eq. (15.95) gives the linear estimation 
model 

T 2 (z~ l )e f k = B s (z - 1 ) (u k + R 0 u k + R^z^Uk + S^z^y* - T^z" 1 )/-*) (15.97) 

in the unknown R 0 , R r = R u - R u (0), S y , T 2 , T and the input-output data 
e k> u k, yk, r k- Filtering by means of Bs{z~ l ) prorides the suitable regres¬ 
sor elements required for linear estimation. Notice that the diagonal matrix 
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Bs(z 1 ) contains only the time delays and other non-in vertible system zeros 
of the control object. 

Example 15.9—Multivariable direct adaptive control 

Consider the multi-input, multi-output system 

r W - ££) (J_, ! TZ -') m 

(15.98) 

Some interesting cross-coupling features appear when 62 is nonzero, in which 
case it appears difficult to achieve independent control of the two outputs. 
However, a decomposition of J3(z -1 ) according to Lemma 15.1 gives 


Biz- 1 ) = 


r 0 

6 2 ' 

fz" 4 0 l 

r 1 °1 

■H?* 

6 3 z _1 

O 

1 

1&*- 2 W 


(15.99) 


A feasible diagonal matrix Boiz -1 ) which can be obtained by diagonalization 
of B from the right is 

fz~ 3 O') 


(15.100) 


Pole assignment to Ao(z 1 )A m (z *) = 1-2x2 yields 

Tziz- 1 ) = 

and 


1 

V x z 

1 

I 


b<2 

TO 


Tiz- 1 ) = 


*i -sife] 

i h z ~ 2 0 

A solution of the Diophantine equation gives 


Ruiz- 1 ) = 


1 + ^-z - 3 


(15.101) 


(15.102) 


0 


1 + 6i 2 -2 + g j^ L z~ 3 + «2g6iz- 4 + a 3 |lz- 5 1 

2 ( 15 . 103 ) 


and 


-8 


0361,-2 

TO* 


ax 020^ -1 _ 6103 -2 bias 2 , 610203 -3 _ a f&i --4 


+ + - , - - 7 - ~^z-" - 


U2U3 4 


0203 


bih 


(15.104) 


S^z- 1 ) = 
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Time [s] Time [s] 


Figure 15.13 Multivariable direct adaptive control of a discrete-time two-input, 
two-output system with strong cross-coupling properties. The outputs yi and yi 
(solid line) reproduce the reference signals y™ and y™ (dashed line) after a transient 
of adaptation. Notice also that the interaction between the outputs disappears after 
the transient of adaptation. The cross-coupling is visible in the inputs throughout 
the simulation. 

The input estimation model for the inputs u\ and 112 applicable at time k is 


«!*-< = 

( -“h-7 * 

-yu-A 

y^it-e yZ-* ) 011 + ( ei ‘-> e2 *) 012 

“2 t _, = 

( <>Ivl ~4>ly 


) $21 + Cl* #22 

(15.105) 

where 




fau = 

( -«lw 

-“u-. 


fay = 

( -yu-, 

-JU-2 

1 T (15.106) 

-yi*_3 3 , 2*_3 -J2*.* -y2*_5 J 

<p2y m — 

SL 3 
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and where 0 n, 8 12, 02i, 822 contains the coefficients of R u , Sy, T , and T 2 . The 
correct pole-assignment control law would be 

“ 1 * = [ - w u-3 -yu -y*k-2 y™ y% ) 

) > (15.107) 

U 2„ = ( <Plu °ly <Ply - ) 021 

where 

<t>2u 

<p2y 
<p2y m 

where parameter estimates 6n and 021 enter instead of 6 n and 021 in the case 
of adaptive control. A simulation of adaptive control based on recursive least- 
squares identification of this system is provided in Fig. 15.13 for parameters 

(ai a 2 a 3 b x b 2 b 3 ) = (2 1 -1 1 2 3] (15.109) 


= -u u _ t -u u _ 

- (-yi* -yi*_, 


~ U lk-A U U-S 


~yi k -2 yth-2 y%k-2 y2k~4 


(15.108) 


yt. 


The A-parameters result in the characteristic polynomial detA(z _1 ) = 1 — 
aiz~ l - a<iazz~ 2 = (1 - z -1 ) 2 . It is obvious that the multivariable adaptive 
control achieves the desired closed-loop properties and that the control effec¬ 
tively eliminates cross-coupling between the outputs. ■ 


15.7 DISCUSSION AND CONCLUSIONS 

All the examples of adaptive systems have as a common feature the use of 
negative feedback as a principle to control or estimate necessary parameters. 
The principle of negative feedback originating from control systems analy¬ 
sis is used to control the output error signal, i.e., deviation of the system 
from prescribed operation. The terminology used in the theory of artificial 
neural network is that learning or supervised learning or adaptation is an 
adaptive process that incorporates an external “teacher” or some global infor¬ 
mation with indications of correctness and performance. Supervised learning 
includes decisions when to turn off adaptation and how to supply a priori in¬ 
formation. As a constrast there also exist other methods called self-learning 
or unsupervised learning or self-organization, which have no reference to a 
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feedback principle or a “teacher” but, instead, rely upon internal criteria of 
correctness. 

In the case of adaptive control there are two interacting feedback loops. It has 
been suggested that a properly organized feedback control loop might elimi¬ 
nate the need for parameter estimation. In fact, it was early observed that 
the output error and the sensitivity to parameter errors can be drastically 
reduced by using strong feedback in feedback control loop. Strong feedback 
can be imposed by increasing the feedback gain parameters (“high gain”) or 
by using certain forms of relay feedback (“sliding modes”). There are, how¬ 
ever, two fundamental limitations to the use of strong feedback control. First, 
the sensitivity to disturbances in the measured output and, second, stability 
problems caused by the presence of unmodeled time delays (see Section 8.5). 

Indeed, there are control methods which systematically try to exploit the re¬ 
duction of parameter sensitivity associated with high-gain feedback. This ap¬ 
proach can be very successful in applications with a maximum transfer func¬ 
tion phase shift of 90° over the frequency range. For larger phase shifts this is 
no possible control principle but—as an extension—it is possible to establish 
limit-cycle oscillations by introducing a saturating high-gain nonlinearity in 
the feedback loop— e.g ., a relay or a function f(x) = b tanh ax. Limit-cycle 
oscillations may be characterized in terms of their period and maximum am¬ 
plitude which, in turn, may provide sufficient information for calculation of 
gain and phase shift in the associated feedback loops. This idea has been prac¬ 
ticed in response to the need for simple control methods and for identification 
and adaptive control of systems with a large range of gain variation. The re¬ 
sulting class of adaptive systems may exhibit low sensitivity to gain changes, 
but they may often have a certain sensitivity to external disturbances being 
amplified. 


An important difference between adaptive system theory and identification 
theory is that stationary stochastic processes are not quite adequate for the 
analysis of adaptive processes. As assumptions of stationary behavior are ir¬ 
relevant for adaptive processes, it is often difficult to characterize adaptation. 
An ad hoc method to approach this problem is to apply recursive estimation 
-he time senes and Lo use the sequence of estimated parameter as a char¬ 
acterization. Additional difficulties arise when the assumptions on the model 
class are not met in application. AlS there are no means for interactive mod¬ 
ification of model class assumptions or validation in an autonomous system, 
this should impose several limitations on the use of adaptive techniques. 


In experimental work there are often various problems associated with adap¬ 
tive systems encountered; for example, in biology. As the object of investiga- 
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tion changes owing to adaptation in the course of an experiment, it may be 
very difficult to establish reproducible experimental conditions. It should be 
considered as an open problem the question of how to appropriately charac¬ 
terize adaptive processes in experimental work. 
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APPENDIX 15.1 

This appendix reviews some fundamental results from Lyapunov theory of 
stability. A useful interpretation of the Lyapunov function V (*) is sometimes 
that of the energy or variance associated with a variable x. 

Theorem 15.3—Lyapunov 
Let 

x=f(x,t), xenczR" (15.110) 

i. The function V (x) is positive definite— i.e., 

V(0) = 0, 

V(x) > 0, ||*|| ^ 0 

ii. The gradient of V(x) — i.e ., 

^- exists for all x e Q (15.111) 

ox 

iii. The derivative of V (*) for x = f(x, t ) is negative definite 

y = ^-x <0, V* jt 0 (15.112) 


Theorem 15.4 

If there exists a positive definite function V (x) whose derivative for x = /*(x, t) 
is negative semidefinite, then the equilibrium is stable (in the sense of Lya¬ 
punov). m 

Theorem 15.5 

If there exists a positive definite function V (x) whose derivative for x = /*(x, t) 
is negative definite, then the equilibrium is asymptotically stable. ■ 

Theorem 15.6 

If in addition to the assumptions i-iii we require that V (x) be radially un¬ 
bounded , i.e., 
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iv. There is a monotonously increasing function </>(■) such that 

V(x) > <t>(\\x\\), Vx e R n and lim 4>{\\x\\) = oo 

then the equilibrium is globally asymptotically stable —that is, the domain 
of stability is all of R n . B 

'Hie condition that V(x) is radially unbounded is necessary to guarantee that 
the hypersurfaces V(x) = c are closed for each arbitrary c so that for each c 
there is a scalar S such that ||x|| < S = 0 _1 (c). 


Theorem 15.7—Lyapunov 
Let 


* = A* (15.113) 

where all eigenvalues of the matrix A have negative real parts. Let the matrix 
P be the solution to the Lyapunov matrix equation 

PA + A t P = -Q (15.114) 

where Q and P are positive definite matrices. Then 


n*) - \x T Px ( 15 . 115 ) 

is a Lyapunov function for the system x = Ax. B 

Proof: Direct substitution of 

roo 

P = Jo eATtQeMdt (15.116) 

into Eq. (15.114) shows that P satisfies the Lyapunov equation (15.114). 

Quadratic optimal control of a linear system 

Optimal control of the state x over a time interval [t 0 , t f ] by means of a control 
variable u of a linear system x = Ax+ Bu can be formalized as the quadratic 
optimization problem 


Minimize J(u) = {tf)Qfx{tf) + ^ j*' x T {t)Qx{]t) + u T {t)Ru{t)dt 
subject to x = Ax + Bu 

for Q f = Qj > 0, Q = Q r >0, R = R T > Q 


(15.117) 



400 


Chap. 15 Adaptive systems 


There is a well-known solution to this problem in the theory of optimal control 
that can be obtained by dynamic programming. The dynamic programming 
reformulation of the optimization problem is to solve the Hamilton-Jacobi- 
Bellman equation 

= min{-x r (f)Q*(0 + ^u T (t)Ru(t) + Ax(t) + Bu(f))} (15.118) 

dt u 2 2 ox 

By introducing the test function 

V{x(t)) = \x T (t)P{t)x(t) (15.119) 

and substituting V into Eq. (15.118) we have (time argument t omitted) 

-±x T Px = min{ \x T Qx + \u T Ru + x T P{Ax + Bu )} (15.120) 

By completing the squares we have 

~-x T Px = min{ hu + R~ 1 B T Px) T R{u + R~ 1 B T Px) 

2 « 1 2 V (15.121) 

+ -x t Qx + x T PAx - \x T PBR-B T PP) 

2 2 

The right-hand side of Eq. (15.121) has a minimum for the optimal control 

!/ = «* = (15.122) 

where P solves the Riccati equation 

-P = PA + A t P + Q - PBR- 1 B t P, P(t f ) = Q f , P = P T > 0 (15.123) 

and with the optimal control performance index 

J(u) = V(x{t 0 )) (15.124) 

The value function V is positive definite with respect to x, radially growing 
and has the negative definite derivative 

— = --x T PBR- l B T Px - < 0, ||*|| / 0 (15.125) 

dt 2 2 

Hence, 7 qualifies as a Lyapunov function for the system, a fact which proves 
global system stability for the optimal control. 
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Remark: In the case of adaptive optimal control over a long time horizon 
tf — to so that P and thus 6 are constant, we replace Eq. (15.122) by u = 
-6<j> = u* — 6<j> for <p = x and introduce the Lyapunov function candidate 

v(x) = ^x T Px + ^e T se, s = s T > o (15.126) 

with the derivative 

V(x) = -x t P t BR~ 1 B t Px - x t Qx - x T PB<p T 6 + d T SG (15.127) 

The desirable adaptation law in order to make V < 0 would be 

6 = S~ l <pB T Px = -S- l <pRu* (15.128) 

This is a possible control law, for example, in learning control when u* is 
known but cannot be used in the normal case of adaptive control where u* is 
not known in advance. m 
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Basic Matrix Algebra 


A.1 PRELIMINARIES 

We use the following definitions: 

Definition A.1 

A rectangular array 


a 11 

Ol2 

Ol n 

021 

«22 * 

* * 02 n 

®ml 

& m2 

&mn 


(A.1) 
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with m rows and n columns of elements belonging to a field 7 is called a 
matrix over 7 of order mxn. This is also denoted as A e 7 mxn _ 

It is standard to use the parallel notations 

A = ( ai j) = A mxn (A. 2) 

in order to emphasize the elements of A and the order, respectively. 

Two m xn matrices A = (a u ) and B = (b u ) are equal if a u = for every i 
and j. 

Definition A.2 

A zero matrix 0 mxn is an mxn matrix with all elements equal to zero. A 
matrix A mxn is called a square matrix of order n if m = n. A square matrix 
of order n 

Inxn - ($ij ), Sij = \ ^ J j (A. 3 ) 

is called an identity matrix. The identity matrix of order n is sometimes 
denoted 

Definition A.3—Vectors 

An m x 1 matrix x is called a column vector whereas a 1 xn matrix y is called 
a row vector. 

Definition A.4—-Linear independence 

A set of vectors {xi,X 2 ,... ,x n ] is said to be linearly independent if and only if 

aiXi + < 22*2 + • • • + a n x n = 0 (A.4) 

implies that the constants a\ = a 2 = • ■ ■ = a n = 0. Conversely, the vectors are 
said to be linearly dependent if there are non-zero constants ai,...,a„ such 
that (A.4) is satisfied. 

Definition A.5 

The transpose of A, denoted A r , is the n x m matrix obtained from A by 
interchanging rows and columns, i.e., A r = (a j; ), 

A square matrix A is said to be symmetrical if A = A T , i.e., if a u = aji . 
Conversely, a square matrix A is said to be skew symmetrical if A = —A T , i.e., 

if aij = —aji. 

Definition A.6 

Let A be a matrix over the complex numbers and let A* denote the transpose 
of the complex conjugate of A. A matrix A over the complex numbers is said 
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A*. i.e., each element ay is the complex conjugate of 


Definition A.7 

The sum of two mxn matrices A and B is t..e mxn 


matrix A + B = (ay + by). 


Definition A.8 

The product of an m x n matrix A 
aA = (ccaij) where a multiplies each 


= ( aij ) with a scalar or is the matrix 
matrix element of A . 


Definition A9-Matrix product matrices with the 

product of A and B is defined to be the m xp matrix C = AB with elements 

(A-5) 


Cik = Taijbjk 




» can be associated a quantity called the 
determinant according to the recursive formula 

, /A, n = l where Ay = (-l)' W detBy (A-6) 

detA = \ 53". 1 ay Ay. n>l 

where D is the matrix obtained by deleting the fth row and jth column of A 
^a!- - called the algebraic complement or cofactor to the element 


An elementary property of matrix determinanta is that for any square matri- 
ces A and B it holds that 


Hut (AB) = det A-det B 


(A/J) 


Definition All—Singular matrix 
A square matrix A is called singular if det A 


0 and nonsingular if det A f 0. 


Uefinition A12—Minor 
A minor (or subdeterminant) of A of order p 


is defined as 
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provided that 1 < z*i < 12 < • -. < i p < n and 1 < 7*1 < 7*2 < ... < j p < n where 
n is the order of A. 

Definition A. 13 —Trace 

The sum of the diagonal elements of a matrix A of order n is called the trace 
of A which is denoted tr(A) so that 

n 

tr(A) = ^ an (A. 9) 

i=l 

Definition A* 14—Inverse of a matrix 

If for a square matrix of order n there exists another square matrix B such 
that 

AB = BA = Inxn (A. 10) 

then B is called the inverse of A and is denoted A~ l . 

Definition A*15—Orthonormal vectors 
A set of vectors v\, .... v n is said to be orthonormal if 

v fvj=Sij, i = j = l,...,n (A. 11 ) 

Definition A. 16—Orthogonai matrix 

A matrix A with the property A T A = I is said to be an orthogonal matrix. 
Result A .1 

If A is a square orthogonal matrix, then A T = A~ l and A has orthonormal 
column vectors as well as orthonormal row vectors. 

Definition A-17—Unitary matrix 

A unitary matrix A is a square matrix of order n such that A -1 = A’, i.e., the 
inverse of A is the same as the transpose of the complex conjugate of A. 

Theorem AA—Cramer’s rule 

A square matrix A of order n has an inverse if and only if A is nonsingular. 
The inverse A -1 is then unique with elements 

A-u A 21 
A 12 A 22 

Ai n A 211 • • • A nn ) 

where adj(A) is the matrix with elements obtained from algebraic comple¬ 
ments Aij to the elements of A. 


A -i = ( adj(A) = 1 

' detA ' detA 


A n 1 ' 

A„2 

(A. 12) 
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Theorem A.2 

Every square matrix A satisfies the relationships 

n 

^ ^ u ikAjj — Sjk detA 

'-; 1 (A. 13) 

^akiAji = £/* detA 

i=l 


Theorem A3 

If A and B are two nonsingular matrices of order n, then the product AB is 
nonsingular and 

(AB)" 1 = B -1 A -1 (A 14) 

Definition A18—Rank of a matrix 

The rank of a matrix A of order m x n is the maximum order of non-zero 
subdeterminants. 

Definition A19 

Given a square matrix A, if there exist a column vector x and a scalar X such 
that 

Ax = Xx (A 15) 

then X is called an eigenvalue of A and x is the eigenvector associated with 
the eigenvalue X. 

The eigenvalues can be obtained by solving the characteristic equation 

det(A - XI) = 0 (A 16) 

where det(A — XI) is called the characteristic polynomial. There are n eigen¬ 
values Xi,...,X n solvingEq. (A16). 

Let A denote the following diagonal matrix of eigenvalues 

r X\ 0 ••• 

0 x 2 

A — 

.0 ••• 0 

Result A2—Diagonalization 
Assume that the nxn matrix A has n independent eigenvectors x, organized 
as the columns of a matrix T. In addition, let A be the diagonal matrix of 


X n J 


(A 17) 
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T~ l AT = A (A. 18) 

Proof: Using the fact that the columns of T are eigenvectors of A it is 

straightforward to establish that 

AT = A^xi x 2 ... x„ j = [AiXi X 2 X 2 ... X n x n j = TA (A.19) 

The matrix T is invertible because its columns are linearly independent. Mul¬ 
tiplication of Eq. (A.19) from the left by T _1 provides the result (A.18). 

Result A-3 

The eigenvalues of a symmetrical matrix A e R nxn are real and there exist or¬ 
thonormal eigenvectors xi,X2,...,x n corresponding to eigenvalues X\, X 2 X n 

so that 

A = TAT' 1 = TAT t (A.20) 

where the columns of T are eigenvectors of A. 

Theorem A.4 

Let A and B be any mxn matrix. Then 

tr(AB r ) = tr (B T A) (A.21) 

Proof: Explicit use of the definition of a matrix product gives 

m n 

tr (AB t ) = a U b U = tr(B T A) (A.22) 

i=ij=i 


Result A.4 

The n eigenvalues X\,... ,X n solving Eq. (A. 16) also satisfy the relations 


detA = X 1 X 2 ■■■Xn 

tr(A) = X\ + X 2 + • • • + Xn — tr(A) 


(A.23) 


Proof: The matrix A = TAT- 1 and it follows that detA = det(TA 7 1 " 1 ) = 
aetr uetAdet(T _1 ) = detA = X 1 X 2 ■ ■ • X n . According to Eq. (A.21) we also 
find that tr(A) = tr(T , A7 1_1 ) = tr((AT _ 1 )T') = tr(A) = X\ + X 2 H-+ X n . 


Result A.5 

Assume that the nonsingular matrix A is partitioned as 



(A. 24) 
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where P and S are square nonsingular matrices. Then 

_i _ j (P - QS~ 1 R)~ 1 -{P-QS- l R)- l QS- 1 1 

A “ l -{S - RP-'Qy'RP- 1 (S-RP-'Q)- 1 J 

Proof: The matrix A can be expressed as 

(P Q\ _ [ 1 owp 0 If 7 P_1 Q1 
[p s) ~ [pp- 1 i) lo s-rp-'q J to i ) 

or 

(P _ (I QS-M (P-QS~ l R 0 W / O'! 

IpsJlO /Jl 0 sjl S~ l R IJ 

Straightforward application of Eq. (A. 14) on Eq. (A.26) and Eq. (A.27) proves 
the result (A.25). 

Result A6—^Matrix inversion lemma 

Let the matrix A of order n be nonsingular and let B and Cbenxm and 
m x n matrices, respectively. Then 

(A + BC)- 1 = A" 1 - A _1 B(/ m xm + CA~ l B)~ l CA~ x (A28) 

Proof This can be verified by straightforward multiplication of Eq. (A.28) 
by (A + BC). 

Definition A.20—Quadratic forms 
A real-valued quadratic form is defined as 

x t Ax , AeR nxn (A29) 

where A is a symmetric matrix. The quadratic form and the matrix A are said 
to be positive definite if x T Ax > 0 for all non-zero x and all the eigenvalues of 
a positive definite matrix are positive. 

Definition A.2i—Projection matrix 

A matrix P with the properties P = P T and P 2 = P is a projection matrix. 
Theorem A.5—Projection 

Let A e R mxn be a matrix with column space C. The projection of a vector v 
into C is Pv where P is the projection matrix 

P = A(A r A) -1 A r 


(A.25) 


(A.26) 


(A.27) 


(A.30) 
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A.2 MATRIX NORMS 


Let x e R a be a vector. For every vector norm ||x|l it is required that ||x|l > 0 
with ||x|! = 0 only for x = 0 and that \\ax\\ = aj|xjl for a scalar a. Moreover, 
for vectors x, y it is required that 

||i + y|l £ 11*11 + M 
ll* I >ll £ 11*11 ■ IMI 


Some important examples of vector norms are 

IW|i = |xi| + |x 2 | + ••• + |x„| 1-norm. 

NI 2 = v'x^x = (|xi| 2 + |x 2 | 2 + •• • + |x„| 2 ) 1/2 2-norm 

INIIp = (|*i| p + |* 2 | p + • • • + |*„| p ) 1/p p-norm 

Nloo = maxdxil, |x 2 |,..., |x„j) oo-ncrm 

Let W be some positive definite symmetric matrix. A weighted 2-norm is 
defined as 

\\x\\ w = Vx T Wx, W = W T > 0 (A.32) 


Definition A.22—Matrix norms 

The matrix norm of a matrix A corresponding to a certain vector norm || • || p 
is 


ikMip = sup 

X 


\\Ax\\ p 

Mp 


(A33) 


Definition A.23—Frobenius norm 
The Frobenius norm of a matrix A e R mxn is 


II^-IIf = 


m n 


EEKf 


(A.34) 


A.3 SINGULAR VALUE DECOMPOSITION 

Let A be an m x n matrix of generally complex-valued elements. Then there 
exist mxm and nxn unitary matrices U and V (i.e., U~ l = U* and V -1 = V"*) 
such that 


A = UHV * 


(A.35) 
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where lisanmxn matrix whose elements are zero except possibly along its 
main diagonal. These nonnegative diagonal elements are ordered such that 

(Tn 2: O 22 > ... > <7 PP > 0 where p = min(m, n ) (A.36) 

The diagonal elements <Jkk are known as the singular values of the matrix 
A and the nonzero singular values correspond to the positive square roots of 
the eigenvalues of the nonnegative Hermitian matrices AA * and A*A. The 
columns of U and V correspond to the orthonormal eigenvectors of the non¬ 
negative Hermitian matrices AA* and A*A, respectively. The matrix decom¬ 
position (A35) is known as singular value decomposition or SVD. 

A useful property of the SVD is that several matrix norms can be represented 
in terms of the singular values. In particular, one has 


l|A||a = on 

I|A||f = ^<711 + 022 + --- + <t pp 


(A37) 


Assume that r = rank (A). If we define the matrix A + = VL + U * where 

f 1/Ou Orx(m-r) 1 


x + = 


l 0(n-r)x 


1 /Or 


°("-r)x{ 


m-r) 


eR r ‘ 


(A38) 


If r = rank (A) = n then A + = (A r A) _1 A* which is called the pseudo inverse 
or (Moore-Penrose inverse ) of the matrix A. Clearly, if m = n = rank(A), 
then A + = A -1 . For a real-valued matrix A we define the pseudo inverse 
A + = VZ + U T . 

The singular value decomposition has a close relation to the least-squares 
problem in that the least-squares solution 


x = arg min||Ax - 6||| = A + b (A.3y) 

for a matrix A and a vector 6. In the rank deficient case when there are 
infinitely many solutions that minimize ||Ax - &|[2 then x is the solution with 
the smallest 2-norm. The residual sum at the optimum is 


||Ax - 6||| = ||(AA + - I)b\\l 


(A 40) 
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The singular value decomposition is valuable to solve the problem of finding 
the mxn matrix of rank k being less or equal to rank(A), which will best 

approximate A in the 2-norm sense, i.e., 

minj(A-AW|j 2 (A.41) 

The unique solution to this problem is 

A w = Ul. k V (A.42) 

where U and V are as in (A.35) while Z* is obtained from Z by setting to zero 
all but its k largest singular values. The accuracy of this approximation is 


l|A — A w || 2 - or UA-A^Hf 




E 

j = k+l 


<r?. 

jj 


(A.43) 


Another standard problem that arises in the least-squares normal equations 
is to find the eigenvalues of AA r (or A T A) where A is some given matrix. The 
singular value decomposition also provides a solution to this problem and the 
eigenvalues of AA T are the squared elements of Z. 


A.4 QR-FACTORIZATION 

Consider a system of linear equations of the form 

Ax= b, A e R mxri , b eR m (A.44) 

Assume that an orthogonal matrix Q e R mxm (i.e., QQ T = I) can be computed 
so that 

= R = \ n Rl ) • e Rnxn (A.45) 

with R i being upper triangular. Let 

Q T b = (S 2 ) ’ s i e Rn < s 2 g R n ~ m (A.46) 

The residual sum of a least-squares problem can then be expressed as 

IIA* - b\\l = II Q T Ax - Q T b\\l = || R lX - Sl \\l + |M|! (A.47) 
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and the least-squares solution is obtained by solving the upper triangular 
system 

Rix = si (A.48) 

The method described is called QR-factorization, and a good numerical method 
to achieve the triangulation matrix Q is the Householder method. As sug¬ 
gested by Householder, such a transformation can be accomplished by means 
of choosing the triangulation matrix Q as 


Q = P n P n -i...P u where P k 




0 

1 - 2 ^- 
»k v k J 


(A.49) 


where {P*}J =1 is a sequence of unitary matrices with the property P\ = I. 
Let the vector v\ that defines Pi be chosen as 



(A.50) 


The matrix product PiA is then a matrix whose first column is zero except 
for its first element. This is clearly a first step toward triangulation, and it is 
suitable to proceed by recursion. Assume that an intermediate result at step 
k -1 has been obtained as 

P*_iP*_ 2 ...PiA= (A.51) 

l 0(m-*+l)x(A-l) At* ^ J 

where is an upper triangular matrix of dimensions (k — 1) x(k — 1). 

Let Vk and thus by extension Pk of Eq. (11.52) be chosen according to the 
algorithm 

a k ~ [ Olx(A-l) <*11 ^ ••• a (m-i+l)l ) 

T 

e k = (Oi X (i_i) 1 0i X ( m _£) j (A.52) 

vk = oc k - yjala k e k 

It is then straightforward to verify that Eq. (A.51) is valid also for recursion 
step k, and by extension to k - n one can verify that the procedure results 
in the upper triangular form (A.45). (For further details see Golub and Van 
Loan, 1989.) 
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A.5 MATRIX DIFFERENTIATION 


Let x e R n be a vector and let f{x) be a scalar function of x. The partial 
derivative of f with respect to x is the row vector 

' 2L = f SL *L ... *L ) (A 53) 

d x { dx , 3x 2 3x„ J 

Differentiation of a vector-valued function f : R n -+ R m with respect to x gives 
the Jacobian matrix 

§ 1 l ... <LLl 

dxi dxz 6x n 

dfi dfz dfz 

dx\ dx 2 dXf. 


1Ll ... PLl 

dxi dx 2 dx n 

Differentiation of a scalar function f(A) with respect to matrix A gives a 
matrix with elements 

AL ... 1L \ 

da ii dcL\n 

: : (A.55) 

JLL ... AL 

da n j da nn ' 

Example A.1 

Let A be a square matrix in R nxn with elements a,y and let V be the scalar 
function 

n n 

V(x) = x T Ax = Y^Y, a ijXiXj, x e R n (A.56) 

1 = 17=1 

Element-wise differentiation gives 





8V 

dx k 


n n 

= 2^ / y G ij&ik x j + ciijXiSjk = ejA* + X' Ae*, 
*=i 


where ek = 


and differentiation with respect to the column vector x yields 



ox 


(A. 58) 
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Result A. 7 

. Let A be a square nonsingular matrix of order n. Then 


8 

dA 


logdetA = A 


-r 


(A. 59) 


Proof: From Eq. (A. 13) it follows that 

n 

detA = aijAij 

so that 


i= 1 


d log detA 


Ay Ajj 


(A.60) 


(A.61) 


day n_i a ijAjj detA 

Identification of terms between Eq. (A.61) and Eq. (A. 12) proves the result 
(A.59). 

Result A.8 

Let A be a square nonsingular matrix of order n. Then 


8 

8A 


(tr (BA" 1 )) = -(A -1 BA -1 ) r 


(A.62) 


Proof: From the definition of a matrix inverse it holds that AA 1 — Inxn 

and differentiation of I nxn gives 


dA x dA 1 
A 1 + A—— = 0 


3a 


so that 


■‘j 


daij 


3A- 1 ,_i dA A _ x 


dai 


= -A- 


da tJ 


(A. 63) 


(A. 64) 


The matrix dA/daij can be expressed as 

P- = e i e J (A.65) 

da// 7 

where e, is a zero vector except for 1 in the ith position. 

^ -tr(BA -1 ) = tr(B^—) = -tr(BA -1 eiejA -1 ) = -ef (A^BA^fej 

fin : : J 


da 


(A. 66) 
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Hence, 

Jftr (BA 1 ) - ~(A-‘BA-y 
which proves the statement (A.62). 


(A. 67) 


A.6 POLYNOMIALS AND POLYNOMIAL MATRICES 


Let i£[x] and R(x) denote the polynomials and rational functions in the inde¬ 
terminate x with coefficients in R, respectively. 

A polynomial is said to be irreducible if it can not be divided by any other 
polynomial of degree greater than 0. 

Definition A.24—Polynomial matrix 

A polynomial matrix is a matrix with elements in E[x], and the set of mxn 
matrices with elements in i?[x] is denoted /? mxn [x]. 

Definition A.25—Unimodular matrix 

A polynomial matrix A e i2 nxn [x] is said to be unimodular if and only if 
detA = c where c e i?\{0). 

Definition A.26—Matrix rank 

The rank of a polynomial matrix A e R mxn [x] is the highest order of any 
non-zero minor of A. 

Definition A.27—Invertibility 

The matrix A e i? mxn [x] is right invertible if there isaBe R nym [x] such that 
AS = 7 m xm- The matrix A e R myn [x] is left invertible if there is a Be i? nxm [x] 
such that BA = I nxn . The matrix A e R nxn is invertible if it is right invertible 
and left invertible. 

Example A.2—Invertibility 

The following unimodular matrix Q e R 2x2 [x] is invertible 


Q(x) = 


l 

0 


x + x 

1 



(A. 68) 


where the inverse Q- 1 is a polynomial matrix. 

Definition A.28—Matrix equivalence 

The matrices A and B in R myn [x] are said to be equivalent if there are uni¬ 
modular matrices U e i? mxm [x] and V e /? nxn [s:] such that A = UBV. 
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Definition A-29—Left divisors 

If A e R mxn [x] can be factorized as BC where B and C are polynomial matrices 
and B has linearly independent columns, then B is a left divisor of A. 

Definition A.30 —Right divisors 

If A e i? mxn [x] can be factorized as BC where B and C are polynomial matrices 
and C has linearly independent rows, then C is a right divisor of A. 

Definition A.31—Greatest common left divisor (g.c.l.d.) 

Let the polynomial matrices A e B mxn [x] and B e R my - p [x] and let U belong 
to R mx *[x] for some k. If U is a left divisor to both A and B and if every other 
left divisor to both A and B is also a left divisor to U, then U is called the 
greatest common left divisor (g.c.l.d.) to A and B. 

Definition A.32—Greatest common right divisor (g.c.r.d.) 

Let the polynomial matrices A e B mxn [jc] and B e R pxn [x] and let V belong to 
f or some £ If V is a right divisor to both A and B and if every other 
right divisor to both A and B is also a right divisor to V, then V is called the 
greatest common right divisor (g.c.r.d.) to A and B. 

Result A.9 

For every pair of polynomial matrices A e B mxn [x] and B e R mxp [x] there 
exists a g.c.l.d.. If rank([AB]) = r, then any g.c.l.d. U has r columns and can 
be expressed as 


U = AX + BY, U e B mxr [x] (A.69) 

for some X e R nxr [x] and Y e B^ xr [x], 

Result AAO 

For every pair of polynomial matrices A e fx] and B e there 

exists a g.c.r.d.. If rank([A r B r ]) = r, then any g.c.r.d. V has r rows and can 
be expressed as 

V =XA^YB, V e R™[x] (A.70) 

for some A r e R rXm [x] and Y e /? rx .°j x j. 

Definition A.33—Smith form 

For any polynomial matrix A lx) € R mxn [x] of rank r it is possible to find 
unimodular matrices U(x) e B' 7txm [x] and V(x) e B nx,1 [x] such that 


A(x) = U(x)Z(x)V(x) 


(A.71) 
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r ci(*) 


£(*) = 


Orx(n-i) 1 




e 2? mxn [x] (A.72) 


v 0(m-r)xr Q(m-r)x(n-r) ) 

and the {<t, (x)} are unique polynomials obeying a division property 


0i(*)ta+i(*) (A.73) 

The matrix Z(x) is called the Smith form of A(x) and the (<jj(x)} are the 
invariant polynomials of A(x). 

Result A.11—Invariant polynomials 

Let Ai(x) denote the greatest common divisor of all ixi minors of a polynomial 
matrix A(x) and let Ao(x) = 1. Then we can obtain 

(A-74) 


Definition A.34—Left co-prime matrices 

The polynomial matrices A e i? mxn [x] and B e R mxp [x\ are said to be left 
co-prime (or relatively left prime or mutually left prime) if the g.c.l.d. is uni- 
modular. 

Definition A.35—Right co-prime matrices 

The polynomial matrices A e f? mxn [x] and B e i? pxn [x] are said to be right 
co-prime (or relatively right prime or mutually right prime) if the g.c.r.d. is 
unimodular. 
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Statistical Inference 


B.1 PRELIMINARIES 

A random variable or a stochastic variable has a value which is dependent on 
chance and which cannot be predicted from a knowledge of the experimental 
conditions. To describe the outcome of a random variable X it is common 
practice to introduce the probability distribution function 

F(x) = <P[X < x }, 0 < F(x) <1, Vx e R (B.l) 
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where T{X < x } denotes the probability that X < x. The derivative f(x) of 
the distribution function F(x) is called the probability density function, and 
the solution x a to the equation 

T[X < x a ] = a (B.2) 

is called the a-percentile of the distribution. 

In cases when there is no risk of confusion we use x to denote also the random 
variable. The statistical mean p y or expectation of a variable y, which is a 
function of a random variable X, is defined as 


My = F,{y] = f y(x)f(x)dx 

J —OO 


and the mean of the distribution is 


Mx = £{*} 


/: 


xf(x)dx 


(B.3) 


(B.4) 


The variance of a scalar variable y(x) is defined as 

Var{y} = £ {(y - p j,) 2 } = f (y(x) - p y ) 2 f(x)dx (B.5) 

J —DO 

In the case of vector-valued variables it is standard to use the definition of 
covariance 


Var{y} = Cov{y} = ^{(y -p y )(y-p y ) T ] = f (y(x)-p y )(y{x)-p y ) T f{x)dx 

J —OO 

(B.6) 

because it also describes the statistical relations between the components of 
the vector y. The covariance between two variables x and y is 

Cov{x,y) = <E{{x-p x ){y-p y ) T ) = £{ xy T ) - p x p J (B.7) 


Definition B.l—Statistically independent variables 

Two random variables x and y are said to be statistically independent if the 

probability density function 


f(x,y) = fi(x)f 2 (y), for some f u f 2 . 


(B.8) 
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Definition B.2—Statistical covariance and correlation 
The correlation coefficient between two variables x and y is 


Cov{x,y) 

0 % 0y 


where 


= 

K 2 = 


Var{x} 
Var{ y} 


and two variables x, y are uncorrelated if 

Co v{x,y} = 0. 


(B-9) 


(B.10) 


Definition B.3—The p th moment 

The p th moment of a probability distribution F(x) is the mathematical expec¬ 
tation 


£{**} = r 

J —oo 


x?f{x)dx 


(B.ll) 


The sample mean of a set of N observed variables x,- for i = 1,..., N is defined 
as 

(B.12) 

i -1 

and the sample (co) variance is 

s2 = Jj X> - Fx)(xi - p x ) T (B.13) 

V i=l 

or for unknown fi x we calculate the sample variance 

s 2 = - x)(xi - x) T (B-14) 


B.2 CONVERGENCE AND CONSISTENCY 

It is often desirable to provide an interval [6l, &u] in which the value of a 
parameter 6 would expect to lie with some probability 


T{6l < 6 < 6u } = 1 - a 


(B.15) 
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In the case a = 0.05 this interval [0 L , 0 V ] is called a 95% confidence interval. 
More generally, the interval [9l, 6u] associated with a probability level 1 — a 
is called a 100(1 - a)-percent confidence interval and the statistics 6i and &u 
are the lower and upper confidence limits. In hypothesis testing, a is called 
the significance level of a test. 

Definition B.4—Convergence in LP, 0 < p < oo 

The sequence {x*} is said to converge in LP to x if and only if 

limE{|^- jc| p } = 0 (B.16) 

Definition B,5—Convergence almost surely (a.s.) 

The sequence {x*} is said to converge with probability one (w.p.l) or almost 
surely (a.s) to x if and only if for every e > 0 we have 


-*! ~ £ ’ ~ = 1 (B.17) 


Definition B.6—Convergence in probability (in pr.) 

The sequence {x*} is said to converge in probability (in pr.) to x if and only if 
for every £>0we have 

T{\x k -x| > e] = 0 (B.18) 

Definition B.7—Convergence in distribution (in dist.) 

The sequence {x*} is said to converge in distribution (in dist.) to F if and 
onl> if the sequence { F^} of corresponding distribution functions converge to 
F - 

A basic result of probability theory is the following implication relationships 
between the convergence concepts 


tr P prob. 

Xk X => Xk —x x 


litr 

dist. 

Xk —> X 


a.s. 

Xk 


F 


(B.19) 


Theorem B.l—Central limit theorem 

Let (xfcj be a sequence of independent random variables with common 
distribution function F with finite mean // and variance c 2 . If x /,, 

then 


X 


N 


Sy - NjX 
a\fN 


dist 
—> 


^( 0 , 1 ) 


(B.20) 
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i.e., Xfj has a limiting normal distribution with mean 0 and variance 1 as 
N —> oo. . ■ 

The central limit theorem is commonly used as an approximation theorem in 
identification theory and is also often used to justify assumptions on normal 
distribution of variables. 

The sample mean x and the sample variance s 2 are estimates of the mean ji 
and the variance cr 2 by means of some functions of data. In general, we use 
the notion statistic {i.e ., function of data) in comparing, describing, estimating, 
and in making decisions on the basis of the results of samples. Of all possible 
estimates of a parameter or variable some are better than others, and it is 
important to distinguish good estimates from poor estimates. For this reason 
it makes sense to introduce the notions of efficiency and consistency. 


Definition B.8—Efficient estimate 

An estimate 0 is said to be an efficient estimate of a parameter 0 if 

£{(0-0) 2 } <£{(0'-0) 2 } (B.21) 

for any other estimate O'. N 


Definition B.9—Consistent estimate 

Let an estimate of a parameter 6 based on N samples be denoted On- The 
estimate On is said to be a consistent estimate (in L 2 or in quadratic mean), 
of 0 if 

lim £{(%-0) 2 } = 0 (B.22) 

N-±oo 


Consistency (in probability) of an estimate 6^ is defined as the convergence 
in probability of Gn to 6 , i.e., 


lim £P{ J^at — G\ > e] =0, for any e > 0 (B.23) 

N-+oo 


A shorthand way of writing Eq. (B.23) is 


■ 


plim On = 6 


(B.24) 


where “plim” is called probability limit. An advantage is that there are at¬ 
tractive algebraic properties of the probability limit. For instance, it can be 
shown that for any continuous function f(G) it holds that 


plim f(0 N ) = /‘(plim G N ) 


(B.25) 
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plim(AB) = plim(A)plim(B) (B.26) 

provided that such probability limits exist (see Wilks, 1962.) 

Definition B.10—Unbiased and asymptotically unbiased estimates 
An estimate 6n of 9 based on N data is said to be unbiased if £{<?//} = &• 
The estimate is said to be asymptotically unbiased if 

lim £{0jv) = 9 (B.27) 

N-*oq 

■ 


Theorem B.2—The Cramer-Rao lower bound 

Let y be observations of a stochastic variable, the distribution of which de¬ 
pends on an unknown vector 9. Let L(y,6) denote the likelihood function 
and let 9 = 6{y) be an arbitrary unbiased estimate of 9. Then 


Cov(0) > (£{( 


dlogL' T ,d\ogL ! _ 


89 


-re- 


do 


)}r = -(£{ 


d 2 log L .. —1 
8989 T ‘ 1 


(B.28) 


Proof: Consider the covariance matrix 




( ( d \° Z L \T 

v de ) 

6 


)(¥ ) 






Cov{0} 


(B.29) 


where the upper-left block contains the Fisher information matrix 

r n- f / d iog L T d log L . 

= - w -\ 

and where the lower-right block is the covariance of 9. Using the relationship 

dpi?\e) d}ogpiy\e) 

89 89 1 ' 

and beginning with the off-diagonal block of Eq. (B.29) we find that 

-dlogpCXIf?) 

L>\J \V)U-J 

dlogp(0 r \9) 


(B.30) 


(B.31) 




= f o —*™ '"’ pwmy 
= J S !:, ' Jg gf' /J> pVW)d-r-6 1 


89 


■pO'\o)dy 

(B.32) 
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As 0 e R p is assumed to be an unbiased estimate of 0, we have 


£{0} = J 6Qr)pQr\e)d*y = e (B. 33 ) 

so that 


^[0] = J 0{J)^M dO r = / (B.34) 

provided that differentiation under the integral sign is allowed. In addition, 
as 


o = §0 / pW) d< y = J ^ logp d f { 6 ) pme)dr = n alogp d [ p 6) ) ( b . ss ) 


one can factorize the matrix (B.29) according to the Result A.5 from Appendix 
A so that 





0 

Cov{6} -V 1 



U l 

I 


) 


(B.36) 


The requirement of a covariance matrix being non-negative definite and sym¬ 
metric by construction entails that the block diagonal matrix in Eq. (B.36) 
is non-negative definite or that Cov{#} > Ig 1 as stated by the theorem. Fi¬ 
nally, the equality of Eq. (B.28) is shown by means of Eq. (B.31) and the 
relationship 


0 - emj wwy 

‘j- } m™ pme)d 7 + l ( 


dlog0\0) ){ dlog^ m )pO r {e)d0 r 


Mathematical expectation of Eq. (B.37) then proves the theorem. 


(B.37) 


B.3 SOME IMPORTANT PROBABILITY DISTRIBUTIONS 

The probability density function of the normal distribution is 

f{x) = —^ e -(*-//) 72 <r 2 ( B .38) 

V27T<T 

for a variable x with mean fi and variance a 2 (see Fig. B.l). Hence, the nor¬ 
mal distribution is parametrized by its mean n and its variance cr 2 and it is 
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Figure B.1 The probability density function of the normal distribution. 

often denoted in statistical practice. In consequence, the statement 

that a variable x is normally distributed is denoted x € r 2 ). The dis¬ 

tribution fA£(0,1) is called the standard normal distribution, and a stochastic 
variable x e fA£(0,1) is called a standard normal random variable. 

A multivariate normal distribution (//, Z) with mean fi e R p and covariance 

matrix Z € R pxp has the probability density function 

f(x) ' exp H ( * _ - ">) (B - 39) 

The probability density function is symmetric around the mean ji and some 
of the two-tail probabilities, i.e., the probability fP{\x - [i\ > 8} where 8 is 
chosen such that T[\x — fi\ > 8} < a. Standard choices of a are shown in 
Table B.l. The intervals around the mean [// — Scr,/! + 8a] define a confidence 
interval at the probability level 1 — a. Some of the two-tail probabilities for 
the standard normal distribution fA£(0,1) are found in Table B.l. 


The ^-distribution 

By the x 2 ~distribution we mean the distribution of a sum of squares of the 
form 

X 2 = A + • • • + A ( B - 40 ) 

of k independent, standard normal random variables {*;} f =1 where k is called 
the number of degrees of freedom. The mean of the distribution is fi = k 
and the variance <j 2 = 2k for the ^ 2 —distribution with k degrees of freedom; 
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Table B.l Some confidence intervals at various probability levels for the normal 
distribution 


¥{\x\ > 6} 

5 

Confidence level 

Confidence interval 

a = 0.001 

3.29 

99.9% 

[H - 3.29a, n + 3.29cr] 

0.005 

3.09 

99.5% 

[/r - 3.09(7, /J. + 3.09cr] 

0.01 

2.58 

99% 

[// - 2.58cr, fi + 2.58cr] 

0.05 

1.96 

95% 

[// - 1.96(7, n + 1.96(7] 



X 

Figure B.2 The probability density function of the ^-distribution for some de¬ 
grees of freedom k — 2,...»6. 


see Fig. B.2. The probability density function is 


/v_2\ 
/ KA ) 


2 k t 2 r{k/2) 


( 2^kl2)-l„-Z 2 (2 
\X t ^ ' 


0 < 


(3-41) 


where r(-) denotes the standard T-function, which can be found in mathe¬ 
matical software and in statistical tables. 

The a -percentile Xa is the value for which 

?{z 2 < xl) ^ « 


(B.42) 
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Table B.2 Percentiles Za of the % 2 -distribution 


Degrees of 
freedom 

Z 2 005 

Zm 

Z '.025 

z% 

Z%5 

Z% 75 

Z 2 99 

X% 95 

1 

0.00 

0.00 

0.001 

0.004 

3.84 

5.02 

6.63 

7.88 

2 

0.010 

0.020 

0.051 

0.103 

5.99 

7.38 

9.21 

10.6 

3 

0.072 

0.115 

0.216 

0.352 

7.81 

9.35 

11.3 

12.8 

4 

0.207 

0.297 

0.484 

0.711 

9.49 

11.1 

13.3 

14.9 

5 

0.412 

0.554 

0.831 

1.15 

11.1 

12.8 

15.1 

16.7 

6 

0.676 

0.872 

1.24 

1.64 

12.6 

14.4 

16.8 

18.5 

7 

0.989 

1.24 

1.69 

2.17 

14.1 

16.0 

18.5 

20.3 

8 

1.34 

1.65 

2.18 

2.73 

15.5 

17.5 

20.1 

22.0 

9 

1.73 

2.09 

2.70 

3.33 

16.9 

19.0 

21.7 

23.6 

10 

2.16 

2.56 

3.25 

3.94 

18.3 

20.5 

23.2 

25.2 

20 

7.43 

8.26 

9.58 

10.9 

31.4 

34.2 

37.6 

40.0 

30 

13.8 

15.0 

16.8 

18.5 

43.8 

47.0 

50.9 

53.7 

40 

20.7 

22.1 

24.4 

26.5 

55.8 

59.3 

63.7 

66.8 

50 

28.0 

29.7 

32.2 

34 8 

67.5 

71.4 

76.2 

79.5 


A confidence interval at the probability level a around the mean for the 
X 2 -distribution is [x 2 a/2 ,x\- a/2 ] (see Table B.2). For degrees of freedom where 

k > 30, it is possible to approximate the percentile as x 2 = 0.5 (z a + \/2k - l) 2 
where z a is the corresponding percentile of the standard normal distribution. 


The F-distribution 


The F-distribution is defined as the distribution of the ratio of two independent 
X 2 —variables 

F = = (B - 43) 
with the probability density function 


f{F) = 


r(|("i + n 2 )) m i n , 
r(in 1 )r(In 2 )^ 2 ; ((£)F + l)i(» 1+n *)’ 


0 < F < oo (B.44) 


where r(-) is the standard F—function. The mean of the F-distribution is 


n 2 

n 2 - 2’ 


£{F) = 


n 2 > 2 


(B.45) 



428 


Statistical inference Appendix B 


By definition the a-percentile F a satisfies the relation 

a = fP{F<F*} = 1 -(P{F > F a ) =1 (B.46) 

From the fact that the F-statistic is a ratio and from Eq. (B.46) it can be 
shown that the percentiles F a and Fi_ a are related as follows 

- TZSZ5 (B ' 47) 

which in turn helps to determine confidence intervals [F a / 2 ,F 1 _ £t / 2 ] for F. 


BA CONDITIONAL EXPECTATION 


When analyzing data the obser »• er is often met with the question of how the 
outcome of a variable X may be influenced by the outcome of another variable 
Y. The conditional probability is defined as the ratio 


¥{X < x\Y < y} 


f ?{X <x,Y < y) 
<P[Y < y} 


(B.48) 


and the associated conditional probability density function is 


f{x\y) = 


f(x,y) 

m 


(B.49) 


Example B.l—Conditional normal distribution 

Assume that x e R n and y e R m are correlated normally distributed variables 
with expected mean fi x and ji y so that 






The conditional probability density function is then 


f(x\y) = 


fix,y) 

f(y) 


1 exp (~\z T P~ l z) 

v^ (n+m) ' m detP/detP xv exp { -^y T P^y) 


(B.50) 


(B-51) 


According to basic matrix algebra (see Appendix A) this can be simplified to 


exp (-i(x-^ y )^-(x -»,„)) 


f(x\y) = 


(B.52) 
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where the conditional mean u x \ y and covariance Q of x given the observation 
y are 


Mx\y = ‘L\ x\y) = JUx + PxyPyy (y - My ) 
Q = cov{ x\y) = P xx - P xy P~*Pj y 


(B.53) 


■ 


B.5 STATISTICAL HYPOTHESIS TESTING 


Statistical hypothesis testing is a statement about some parameters, e.g., mean 
or variance, of a probability distribution. In particular, statistical testing of 
variance properties has become important and is known as analysis of vari¬ 
ance. Its application is important in contexts w’here the result and quality 
of two different experimental methods should be distinguished by means of 
statistical analysis of data. A relevant observation is that several tests of 
differences between two experimental methods can be reduced to the ques¬ 
tion about the equality of variances of the two methods. This can be stated 
formally as the two alternatives 


H 0 : erf = erf 
H a : erf erf 


(B.54) 


where erf and erf are the variances of the two methods. The hypothesis Ho, 
that the two variances are equal, is called the null hypothesis, whereas Ha is 
called the alternative hypothesis. A test statistic for testing these hypotheses 
is the ratio of the two sample variances, i.e ., 



(B.55) 


based on the sample sizes n\ and n^, respectively. Under the assumption 
of normal distribution and if Ho is true, it follows that the statistic F is 
F{n\—1, « 2 — 1)—distributed. The null hypothesis Ho can be accepted according 
to the two-sided F-test if 


Fa/i(ni - 1, n- 2 - 1) < F < F^^m - 1 ,n 2 - 1) (B.56) 

Therefore, we should reject the null hypothesis Ho for F-values that are 
either too large or too small. 
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If it is desirable to reject H 0 only if one variance is larger than the other, then 
the two hypotheses are 


H 0 : 


of 


o\ 


H a : of > of 


(B.57) 

The associated one-sided F-test consists of accepting the null hypothesis if 

F < F a (m-l,n 2 -l) (B.58) 


Example B.2—A one-sided F-test of variance 

In a model order test in identification one wishes to test whether a new pro¬ 
posed method has a larger variance than the old one. The hypothesis to test 


is 


Ho : of = erf 
H a : of > of 


(B.59) 


Two samples of sizes ni = 10 and ri 2 = 20 are taken, and the sample variances 
are sf = 0.56 and s\ = 0.35 so that the test statistic is F = s\/s\ = 1.60. 
From Table B.3 and Table B.4 we find Fo.os(10,20) = 2.35. Clearly F < 
Fo.o5(10,20), which means that the null hypothesis can not be rejected and 
we have insufficient statistical support for distinguishing between the two 
methods. ■ 


Suppose, instead, that it is desirable to test the hypothesis that the variance 
of is equal to some constant o 2 . The hypotheses are 


A relevant test statistic is 


H 0 : 

: of = <J 2 

9 9 

(B.60) 

H a : 

of > a i 

* 2 = 

(rai - l)sf 

rr2 

(B.61) 


where sf is the estimate of of based on n\ data. The test statistic (B.61) is 
X 2 —distributed under assumptions of normal distribution and under the null 
hypothesis. In this case we should reject the null hypothesis H 0 for values of 
X 2 that are too large, i.e., we reject when 


Z 2 > %l(ni - 1) 


(B.62) 


where Xa( n ~ 1) is the a-percentile point of the x 2 -distribution with n - 1 
degrees of freedom. 
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Table B.3 Percentage points Fo.os of the F-distribution 


ra 2 \rci 1 2 3 4 5 6 8 10 20 30 oo 

1 161 200 216 225 230 234 239 242 248 250 254 


2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.5 19.5 

3 10.1 9.55 9.28 9.12 9.01 8.94 8.85 8.79 8.66 8.62 8.53 

4 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.96 5.80 5.75 5.63 

5 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.74 4.56 4.50 4.36 

6 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.06 3.87 3.81 3.67 

7 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.64 3.44 3.38 3.23 

8 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.35 3.15 3.08 2.93 

9 5.12 4.26 3.86 3.63 3.48 3.37 3.23 3.14 2.94 2.86 2.71 

10 4.96 4.10 3.71 3.48 3.33 3.22 3.07 2.98 2.77 2.70 2.54 

20 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.35 2.12 2.04 1.84 

30 4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.16 1.93 1.84 1.82 

oo 3.84 3.00 2.60 2.37 2.21 2.10 1.94 1.83 1.57 1.46 1.00 


A two-sided test is relevant for testing the hypothesis 

H 0 : erf = a 2 
H a : erf £ a 2 

where the alternative hypothesis should be rejected if 
Xl/2{n -1) < z 2 < x\- a iM ~ 1 ) 

This test is relevant to test if two methods are equal in precision. 

Example B.3—A two-sided z 2 ~ test of variance 
A sum of squares of 10 random zero-mean variables x \,..., xio is 


(B.63) 


(B.64) 


= = 18-4 


(B.65) 


The 95% confidence interval for z 2 is [Zo.o 25 ( 9 )>*o. 975 ( 9 )] = [2.70,19.0] under 
the assumption of normal distribution with cr 2 = 1. Hence we can not reject 
the hypothesis that a 2 = 1. ■ 

There is always a risk that a statistical test may lead to false conclusions. 
Also, good tests may cause rejection of Hq when it is true or acceptance of Hq 
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when it is false. It is standard practice to classify these decision errors as 

Type I error fP{reject Hq\Hq is true} = a (B 66) 

Type II error T {accept Hq\Hq is false} 

The type-I error appears by chance with a probability a but it is usually more 
difficult to quantify the risk of a type-II error. A cautious attitude toward sta¬ 
tistical decision and testing is to reject only hypotheses and to avoid accepting 
an hypothesis that is not rejected. In particular, accepting a very composite 
alternative hypothesis or a very specific and “narrow” null hypothesis might 
lead to wrong inferences. 


B.6 THE COCHRAN THEOREM 


Consider a linear transformation from the vector U = (Ui,...,U n ) T to L - 
(Li,...,L m ) T where each vector L t - is an n,- xn-vector 


' L x ' 


r Ai ' 

^2 

' L m 4 


A m j 

for i -- 

= 1 ,. 

.. ,m. 


AU 


(B.67) 


be denoted r, and let Q, denote the sum of squares of the L, ’s 


Qi = Lf Li = U T AfAiU (B6g) 

Q = L t L = L\Ly + -• • + L T m L m = Qi + --- + Q m V 

Let U\,...,U n be independent, standard normal variables, and suppose that 
one can write an identity of the form 


U r U = Ys U ? = Ql+ ' + Qm (fc- 59 ) 

t=l 

where each Qi is a sum of squares of linear combinations of the components 
of [/, i.e., Qi = U T AjAiU for matrices A x - of rank r x . Then, if 


ri + ••• + r m 


n 


(B.70) 
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Table B.4 Percentage points jPq.oi of the F-distribution 


n 2 \»i 1 2 3 4 5 6 8 10 20 30 oo 

1 4052 5000 5403 5625 5764 5859 5982 6056 6210 6260 6366 


2 98.5 99.0 99.2 99.2 99.3 

3 34.1 30.8 29.5 28.7 28.2 

4 21.2 18.0 16.7 16.0 15.5 

5 16.3 13.3 12.1 11.4 11.0 

6 13.7 10.9 9.78 9.15 8.75 

7 12.2 9.55 8.45 7.85 7.46 

8 11.3 8.65 7.59 7.01 6.63 

9 10.6 8.02 6.99 6.42 6.06 

10 10.0 7.56 6.55 5.99 5.64 

20 8.10 5.85 4.94 4.43 4.10 

30 7.56 5.39 4.51 4.02 3.70 

oo 6.63 4.61 3.78 3.32 3.02 


99.4 99.4 99.4 99.4 99.5 99.5 
27.9 27.5 27.3 26.7 26.5 26.1 

15.2 14.8 14.5 14.0 13.8 13.5 

10.7 10.3 10.1 9.55 9.38 9.02 

8.47 8.10 7.87 7.40 7.23 6.88 

7.19 6.84 6.62 6.16 5.99 5.65 

6.37 6.03 5.81 5.36 5.20 4.86 

5.80 5.47 5.26 4.81 4.65 4.31 

5.39 5.06 4.85 4.41 4.25 3.91 

3.87 3.56 3.37 2.94 2.78 2.42 

3.47 3.17 2.98 2.55 2.39 2.01 

2.80 2.51 2.32 1.88 1.70 1.00 


it follows that the variables Qi have independent distributions, each with 
a number of degrees of freedom given by its rank as a quadratic form. This 
result is known as the Cochran theorem and serves as a theoretical basis for 
a number of statistical tests. 

Example B.4— Application of the Cochran theorem 

Let x\ ,...,xn be independent normal variables with mean £{*,•} = ji and 

Cov{ X[, Xy) = S'ijCJ 2 

= Qi + -- + Q k (B. 71 ) 

i= 1 

each Qi being a sum of squares of linear combinations of x\,...,xn. If we 
consider the special case 

q = = < 3 . + Q 2 (B. 72 ) 

- ** yJ 

we notice that Q can be expressed as the sum of two sums of squares with 
Q l related to the sample variance whereas Q 2 is related to the sample mean 
variance. The variable Q consists of a sum of squares of normal variables and 
has a x 2 -distribution with N degrees of freedom. 

T _ Xi-X _ Xj - U 1 X - II 


cr 


<j 


N a 


(B.73) 
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The sum of these L- is zero and so the rank of the transformation of Qj does 
not exceed N - 1. The distribution of Q is clearly ;f 2 (IV) and that of Q 2 is 
^ 2 (1). If Q 1 and Q 2 are found independent then it can be concluded that Qi 
is X 2 {N — 1)-distributed. Notice now that 




(N -1 -1 ••• -1 ' 


' Xi ' 

u 

1 

-1 IV - 1 : 


X2 

. Ln > 

“ txN 

: •• -1 
. -1 ... -1 N-l, 




= A X (B.74) 


where the matrix A that relates L and X is of rank N — 1. As the degree of 
freedom for the sum is N and as the ranks add up as in the condition of the 
Cochran theorem, the conclusion then follows that Q 1 and Q 2 have indepen¬ 
dent ^—distributions with N - 1 and 1 degrees of freedom, respectively. ■ 


B.7 REFERENCES 

Excellent handbooks of statistics are the following monographs 
— D.R. Cox and D.V. Hinkley, Theoretical Statistics. London: Chapman and 
Hall, 1974. 

- M.G. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. 1, 
(3d ed.), 1969; Vol. II, (3d ed.), 1973, New York: Hafner Press. 

- S.S. Wilks, Mathematical Statistics. New York: John Wiley, 1962. 

where proofs are found for the theorems included in this summary. ■ 



Numerical Optimization 


C.1 INTRODUCTION 

We will consider the unconstrained optimization problem of 

Minimize f(x), x = ^ j (C.l) 

where x is the vector of free variables to be found. Noniterative methods for 
finding the "optimum x that minimizes f(x ) are the grid search and random- 
search. The grid search consists of constructing a p-dimensional block of 
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points to cover the region where x is known to be, evaluating f at each of 
its points and choosing the minimum. A random search consists of randomly 
generating a sequence of points lying within a specified region, evaluating 
f(x), and choosing the minimum. All these methods are extremely inefficient 
since they make a large number of unnecessary function evaluations. 


C.2 DESCENT METHODS 

Several numerical methods for solution of Eq. (C.l) are iterative. Assume 
that the i th iteration has provided the estimate and that the minimum 
lies in the direction d at a distance p from *W. The Taylor series expansion 
around x^ is then 

fix) = f(xW + pd) = f(x^) +pVf(x)\l =xU1 d + 0(p 2 ) (C.2) 

where the gradient g = V f(x^) is the gradient of f evaluated at 

In iterative methods we try to improve the estimate x^ by choosing a value 
x («+i) suc h that /"(x ( ' +1 -) < f(xW) and the search direction d is a descent 
direction at x^ if and only if g T d < 0. For instance, for all positive definite 
matrices R we can suggest the descent direction 

d = -Rg = -RVf(x)\ x=x < o (C.3) 

as g T d = -g T Rg < 0 for non-zero g = V f. An iterative algorithm to improve 
x W is 

x (l+l) = - a^Rg (C.4) 

so that f(x‘ + V) < f(x^) for suitable choices of the step length a^ l \ beginning 
with an initial estimate x ^ and proceeding by iterating 

X («+D =x (i) + a (i) d (0 (C.5) 

One method to choose the step length is, for instance, so that ffx^—a^R^g^) 
is minimized. 

The choice R = I is called the method of steepest descent, and the steepest 
descent search direction is defined by 




(C.6) 
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where V/'(x (l) ) is the gradient of f evaluated at x^. The steepest descent 
direction is simple to implement but has the disadvantage of being very slow 
to converge when the level contours f{x) = c for any constant c are eccentric. 


C.3 NEWTON METHODS 

Consider the Taylor series expansion of a twice differentiable function f(x) 
around x^ with 

m = a* 10 *fd) = f(x : “) X pV f(x)\ T x-x „d + \p 2 d T V*f(x)\ x . xa a + 0(p>) 

(C.7) 

If we neglect higher-order terms and look for minimum of f with respect to 
d, we find that the Newton search direction is 

d {i) = -(V 2 f(x (i) ))~ 1 V f(x {i) ) (C.8) 

where the Hessian matrix V 2 f(x^) is the matrix of second partial derivatives 
of f evaluated at x^ l K The search direction can then be found by solving the 
linear equation 

V 2 f(xM)dW = -Vf( x W) (C.9) 

The Newton method has the advantage of very rapid convergence when x^ is 
close to the optimum x but has the disadvantage of needing time to calculate 
the matrix of second partial derivatives. Another problem is that the matrix 
inverse of second partial derivatives might not exist at some distance from 
the optimum x. 


C.4 QUASI-NEWTON METHODS 

There are two major classes of optimization techniques, quasi-Newton meth¬ 
ods and conjugate gradient methods, that try to avoid some of these disad¬ 
vantages. Quasi-Newton methods approximate the search direction with 

d {l) = -H^Vf{x (i) ) (C.10) 

where ffW i s some positive definite approximation to (V 2 /'(_td)))- 1 obtained 
without evaluation of the Hessian matrix. These methods thus avoid prob¬ 
lems of the matrix second partial derivatives and can guarantee a descent 
direction. 
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C.5 CONJUGATE GRADIENT METHODS 


Given a symmetric matrix Q, two vectors oi and v<i are said to be conjugate 
(or Q-orthogonal) with respect to Q if vf Qv 2 = 0. Thus if Q — I, conjugacy 
is equivalent to the usual notion of orthogonality. Conjugate gradient meth¬ 
ods compute the search direction as a linear combination of the current 
gradient vector and the previous search direction in the form of a recursive 
equation 

d (i+1) = -Vf( x (i+1) ) + P (i) d {i) (C.ll) 

where fi® is a scalar parameter chosen to assure that the sequence of search 
directions satisfies the conjugacy condition 

0 = (dM, T Qd {i+1) = -(<f (i) ) r QV/(x (i+1) ) + p {i] (d {i) ) T QdW (C.12) 


A suitable choice of /?W is therefore 


_ (dVFQvn^ 1 ') 

P " (dW) T QdW 


(C.13) 


A characteristic property of the search directions used in conjugate gradient 
methods is that they avoid steps in the same directions as in a few previous 
steps. This property is accomplished by the condition (C.12). This procedure 
usually includes some evaluation of the Hessian matrix as a means to define 
the Q -matrix. 


Quadratic functions are of particular importance in nonlinear optimization 
for several reasons. A general smooth function can be approximated by a 
quadratic function using the Taylor series expansions. Also, many numerical 
optimization methods are based on the quadratic optimizations. We illustrate 
the behavior of some algorithms when applied to a quadratic function. 


Example C.l—Comparison of some numerical optimization methods 
We apply some of the above-mentioned methods to minimizatioii of the func¬ 
tion 

f{x) = i X T Q 2 X + x T qi + <70 

-ih *) (O (:;H- *0 (iiH 


(C.14) 
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Figure C.l Typical trajectories of some numerical optimization techniques when 
finding the minimum at (1,0) of a quadratic function. Level contours are indicated 
by dotted lines. 


The corresponding gradient is 


^f{x) = Q 2 x + q x (C.15) 

The steepest descent algorithm was applied with a step-length value a = 0.3 
and the quasi-Newton method was using a matrix 


H = 



(C.16) 


The numerical solutions may be compared to the analytic solution of the equa¬ 
tion 

Vf{x) = 0 x = -Qz 1 q 1 = | * j (C.17) 

in the present case. The trajectories of the different methods are shown in 
Fig. C.l where all trajectories start with the initial value at the origin. ■ 
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C.6 DIRECT SEARCH METHODS 

Consider the optimization problem 

Minimize f(x), x = ^xi,...,x n j (C.18) 

In cases where it is not possible to calculate the partial derivatives and the 
gradients it might be necessary to use a direct search method. The Nelder- 
Mead polytope algorithm is one of the most popular and successful direct 
search methods. In order to estimate x e R n it requires (n +1) starting values 
x ( ’ 1 \...,x ( ‘ n+1 ^ ordered so that f(x^) < ... < f(x^ n '^) and so that ||*W -*W|| 
is constant for all i £ j, i.e., so that x( 1 ),...,x <n+1 ) form the comers of a 
regular polytope (triangle, tetrahedron, etc.). By shifting the comer with the 
highest cost-function value x( n+ V the polytope can be moved in space toward 
the minimum. This can be accomplished, for instance, by reflecting the worst 
comer *( n+1 ) through the centroid 


c = 


n tt 


(C.19) 


The resulting algorithm is 


x {n + 2) = c + a(c - * (n+1) ) (C.20) 

for some step-length a so that the cost function f(x^ n ^) shows improvement 
as compared to the starting values x^ l K If the new improved estimate x^ n+2 ^ 
is such that 

< fix'' n+ V) < f(x< n -V) (C.21) 

then x' n ' 2 ' 1 should replace x^ n+l K By reordering all the new {x^j according 
to Eq. (C.21) and iterating the procedure, the polytope moves toward the 
minimum. Notice that this algorithm is usually not competitive with gradient 
methods in cases where gradients can be calculated. 


C.7 PARAMETRIC OPTIMIZATION 

We take the approximate maximum-likelihood identification as an example of 
optimization methods applied to parametric identification. 



1000 


0 


1000 


500 

Time [s] Time [s] 

Figtire C.2 LS and ML identification of an object described by the ARMAX model 
yk + 0.9y*_! = 0.1 m*_i + ek + 0.7e*_i. The lower graphs show the squared residuals 
e\ for the least squares (LS) and maximum likelihood estimation (ML) methods, 
respectively. V 

Example C.2—Approximate ML parameter estimation 
Autoregressive moving average models with exogenous input (ARMAX) con¬ 
stitute a model set general enough to describe colored noise that can not be 
described by the ARX models of the type (6.7). Similar to (6.4) we consider 
ARMAX models of the type 

Mz~ l )yk = z^Biz-^Uk + C(z -1 )i;* (C.22) 

where the noise covariance matrix JL V = r E{ vv T } is now assumed to be un¬ 
known. Formulation of a maximum-likelihood problem involves the formula¬ 
tion of a likelihood function L(6). Considering the case of normally distributed 
noise we have 


L{e) = 


(2k) N I 2 (det ) */ 2 




(C.23) 
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Figure C.3 Block diagram showing the approximate maximum likelihood estima¬ 
tion of an ARMAX model. Minimization of the prediction error is obtained by adjust¬ 
ment of A, B , and C. 

or 

log L(§) = -A log(2ff) w detS 0 - ie r (e)l; 1 e(0) (C.24) 

with e containing the components £* = yk — <p%& and 

yk = - aiyk-i a nA y k -n A 

+ biUk-d-i + • • • + b m Uk-d-n B (C.25) 

+ Vk + CiVt-l + ■■■ + Cr, c Vk-n = = <Pk 6 + Vk 

with 

Qk = ^ ~yk -1 • • • yk-n A Uk-d-i ■ ■ • u-k- 

6 = [ai...a nA bi...b nB 
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where it is a problem that the components u*_i.are not known. An¬ 

other problem is that this optimization criterion is a function of both 6 and 
cr 2 and is thus not known. In the absence of the desired parameters it is 
therefore only feasible to make approximate solutions by finding successively 
better estimates of the covariance matrix £„ and the parameters 6 through 
some iterative procedure. In the important special case with normally dis¬ 
tributed white noise with E„ = a 2 1, we replace Eq. (6.34) by the empirical 
likelihood function 


\ogL{6,a 2 ) = -j- log(2;r) - ^2 E e k0) _ y l °gtf 
= ~ log(Sfcr) - ± V N 0) - £ loga* 


(C.27) 


where 

V "(*) = |E 4(*) = (C.28) 

^ *=1 


The gradient and the second-order derivatives of logL(0) determine the ex¬ 
trema oflogL(0) as 


0 = §g\og £(e,a. 2 ) = 

with the solution 

VV N (0) = 0 


(C.29) 


(C.30) 


A numerical solution to the problem VVh(0) = 0 can be obtained as an iter¬ 
ative procedure via the Newton(-Raphson) method 


0< i+1 > = 6> (i) - a«(V 2 V(0^))- 1 Vy(0W) 


(C.31) 
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where a W is step length to choose and (i) denotes the iteration order. The 
elements of this computation can be given the form 

r*(e> = § £>f (») 

A = 1 

YkiO) = -V£*(0) 


(C.32) 


*=1 






v 2 v„(«) = 2>*(«)rf(») + E e >(W v ° c »(W 


k-1 


k=l 


The Newton method is a good numerical procedure with “quadratic” conver¬ 
gence properties. The method must, however, be modified when V 2 V is not 
invertible, which may constitute a problem at some distance away from the 
solution. It is therefore a wise idea to start the iterative algorithm with 
good initial values obtained from some other numerical algorithm or from a 
least-squares estimate. It is for the same reason difficult to assure global 
convergence by using the Newton method with arbitrary initial conditions. 

Let the expressions in Eq. (C.32) be substituted into 

0 (i+ i> = 6 {i) - a (i >[V 2 7 JV (eW)]- 1 VV w (0( , '>) (C.33) 

where determines the step size in the iteration and is nominally equal to 
1. An approximate gradient can be calculated as 


Ve k = 


d£k 

96 


e f£ 

da 
de 
db 
ds 
v dE 


C(z-') 


yk -1 


C(*-») 


IT ~Jk-n A 
Uk -1 


C(z : ') Uk ~ nB 
1 


C(*->) 


IT**- 1 


C(z~‘) £k n ° 


(C.34) 
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Figure C.4 Typical iterations of Newton-Raphson optimization techniques when 
finding the parameters at (a, 6,c) = (0.9,0.1,0.7) of a log-likelihood function. “True” 
parameters are indicated by dotted lines. 


This expression is often expressed in the pseudo-regressor form 

¥k(0 {i) ) - -Ve*(0 w ) (C.35) 

By iterating this estimation procedure one obtains a filtering similar to that of 
Fig. C.3. Some iterations of optimization of the log-likelihood function based 
on data shown in Fig. C.2. are shown in Fig. C.4 and Fig. C.5. It is clear 
from these graphs that the Newton methods perform better than the steepest 
descent search. ■ 


C.8 BIBLIOGRAPHY AND REFERENCES 

Good general references for numerical optimization methods include 

[Cl] J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained 
Optimization and Nonlinear Equations. Englewood Cliffs, NJ: Prentice- 
Hall, 1983. 
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Figure C.5 Typical iterations of steepest descent optimization techniques when 
finding the parameters at (a,6,c) = (0.9,0.1,0.7) of a log-likelihood function. "True” 
parameters are indicated by dotted lines. 
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D.l INTRODUCTION 

A random variable or a stochastic variable has a value which is dependent on 
chance and which cannot be predicted from a knowledge of the experimental 
conditions. To describe the outcome of a random variable X it is common 
practice to introduce the probability distribution function 

F(x) = <P[X < x}, 0 < F(x) <1, Vx e R (D.l) 
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where T[X < x) denotes the probability that X < x. In addition, F(x) is a 
monotonically increasing function of x. The derivative f(x) of the distribution 
function F(x) is called the probability density function. In cases when there 
is no risk of confusion we also use x to denote the random variable. 

The statistical mean p y or expectation of a variable y which is a function of a 
random variable x is defined as 

Fy = F{y} = f y{x)f{x)dx (D.2) 

J —OO 

and the mean of the distribution is 

p x = £{x} = f xf{x)dx (D.3) 

J —OO 

The variance of a scalar variable y{x) is defined as 

~ Var{y} = £{ (y - m y f) = /°°(y(x) - p y ) 2 f(x)dx (D.4) 

J —OO 

and the covariance between two variables x and y is 

Cov{x,y} =‘Elix - p x )(y - p y )} = ^{xy} - p x p y (D.5) 


In the case of a vector-valued variable y(x) it is standard to use the definition 
of covariance 


/ OO 

(y(x) -n y ){y{x) - p y ) T f{x)dx (D.6) 

OO 

because it also describes the statistical relations between the components of 
the vector y. 

Definition D.l—Statistical covariance and correlation 

The correlation coefficient between two variables x and y is 


Pxy = 


Cov{x, y } 
G x O'y 


and two variables x, y are uncorrelated if 


(D.7) 


Cov{*,y} = 0 


(D.8) 


■ 
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Example D.l—The normal or Gaussian distribution 

Let x e R n denote a vector of random variables. A random vector x is called 

normal (or Gaussian) if its probability density function is 


fx{x) = 


(2^) w / 2 (deti?) 1 /2 


exp{- 


■i<* 


• fi x ) T R x (x - fi x )} (D.9) 


with the mean /i x and the variance R. This is denoted x e R) with 


Hx = £{*} 


<E[x 2 } 




R = £{(x - Hx){x - li x ) T ] 


( Cov{xi,xi} 
Cov{x 2 ,*i} 


Cov{*i,xat} ) 
Cov{* 2 ,*tf} 


(D.10) 


l Cov{x w ,xi} 


Cov{jcjv,x^} ) 


If the components of x are uncorrelated, then the covariance matrix (D.10) 
becomes diagonal. Notice that the covariance matrix is symmetric and positive 
semidefinite since for any constant vector a e R n we have 


0 < V{a T x } = Cov{a T x,a r x) = a T V{x}a (D.ll) 


A noteworthy special case is when a T V{x}a = 0, which means that the com¬ 
ponents of x are linearly dependent. ■ 


Linear transformations 

Consider the following linear transformation from x e R n to y e R m by means 
of a constant m x n—matrix A and a constant m -vector b 

y - Ax + b (D.12) 

The mean f.i y and covariance of y are then related to the mean and covariance 
of x in the following manner 

*E{y) = A£{x} + b = A[i x + b 

V{y] =<E{((Ax + b)-(AM x + bMAx + b)-(Aii x + b)) T } = AV{x}A T 

(D.13) 
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D.2 STOCHASTIC PROCESSES 


A function x(t) = x(t,(o ) whose values depend on a random variable co is 
called a random or stochastic process. For each value of time t, the function 
is a function of (O alone and, consequently, it is a random variable. For each 
fixed value on co the x(t, to) depends only on t and is thus an ordinary func¬ 
tion of one real variable, and each such function is called a realization of the 
stochastic process. For each fixed (0 the function x(t, co) is called a trajectory 
or realization or sample function. In discrete time we find a stochastic process 
in the form of an infinite sequence {**}*L<» or a sequence {x*} j^ =0 over some 
interval of time, i.e ., in both cases time series. 

Each discrete random variable x* should have some fixed probability distri¬ 
bution, usually assumed to be the normal distribution, with zero mean and 
variance a%. A sequence of mutually independent random variables { wf\ |'L_ co 
is called white noise in engineering terminology. 

Definition D.2—White noise 

A sequence of N uncorrelated stochastic variables {whlili with £{u>,} = 0, 
< E{wiWj} = SijCr 2 for all i,j is known as white noise in the domain of time- 
series analysis. ■ 


Autocovariance and cross covariance 

An important special case of time-series analysis is when the dispersion of 
the inputs to a linear system may be described according to stochastic dis¬ 
tributions. The dependent variables—static or dynamic—then also behave as 
random variables. A major objective in applications is to describe the ran¬ 
dom process in such a way that predictions (in a probabilistic sense) of future 
values can be made. As the disturbances affecting a system are not known 
beforehand, it is important to consider various temporal covariances in order 
to make accurate predictions possible. 

The cross covariance function and the autocovariance function are defined as 


C zy (t 1 ,t 2 ) = Cov{x(ti),y(f 2 )} 

Cyy{ti,t 2 ) = Cov{y(*i),y(t 2 )} 

and the cross correlation function and autocorrelation function as 
n ( t t \ = _ C xy (ti,t 2 ) _ 

PxAh, 2) ~ yC«(fY,WC*y(*i.* 2 ) 

n ( f x \ _ _ Cyyjtl, t 2 ) _ 

MtUt2) ~ y/CyyVuidJCyyfa;® 


(D.14) 


(D.15) 
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The stochastic processes [xk] and {y*} are said to have stationary correla¬ 
tions if C xy (ti,t 2 ) depends on r = t\ — t% only. In this case it is customary 
to denote the autocovariance and cross covariance functions by C yy (r) and 
C xy (t), respectively. For stochastic processes with stationary correlations it 
holds that 

C x ,(t) = C yx (-r) (D.16) 

Stationary stochastic processes 

Definition D.3—Weakly stationary stochastic processes 

A random process x(t, co) is called weakly stationary if its expectation fi x (t) = 
£ { x(t, to)) is constant and independent of time t and if the covariance function 
C xx {t\,t 2 ) depends only on the time shift r = t\ — < 2 . ■ 

This definition sometimes takes on a slightly different form when applied to 
discrete stochastic processes. A discrete random process {x*}|i_ 00 is caned 
weakly stationary if its expectation jJ-k — £{x*} and covariance function 
Cov{ (x k -Uk) (x k - q -Mk- q ) T } are independent of k . The covariance function is 
then denoted by C xx (r ) = C xz (qh ) where r = qh and where q is the number 
of samples corresponding to the time shift r. 

Definition JJ. 4 — Strictly stationary stochastic processes 

A stochastic process is said to be strictly stationary if the joint probability 
distribution of some set of N observations xi, ..., x n is the same as that asso¬ 
ciated with the N observations xi+*,..., x^+k for any k . ■ 

Definition D.5—Uncorrelated stochastic processes 

Two stochastic processes {x^} and { y^ } are uncorrelated if and only if cross 

covariance function C xy (r) = 0 for all t. ■ 

Spectra 

The power spectrum of a stochastic process x is the Fourier transform of the 
autocovariance function 


Sxx(i&) — 7{C xx (t)} (D.17) 

where co is complex frequency. The power cross spectrum between two stochas¬ 
tic processes x and y is 

f o C X y(T)e~ l 0 }T dt t continuous time 

S xy (ia>) = !F{C X y(T)} = l 

{ h Y ^=-oo C xy (qh,)e~ l0,q& 9 discrete time (r = qh) 

(D.18) 
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which is sometimes shown in a diagram as Re(S X j,) ( amplitude power spec¬ 
trum) and Im(S iy ) {phase power spectrum). The inverse of Eq. (D.18) is 


c„M 


l &J$HS x ,(ie>)e'"de> 


continuous time 
discrete time 


(D.19) 


Example D.2—Spectral density of a white-noise process 

Assume that the spectral density of a discrete white noise process {«;*} XL-oo 

is such that 


= 0 

Cov{ Wi, Wj\ = o 2 8ij 


(D.20) 


The spectral density is then 


S ww {ico) = ha 2 . 


n 7i 

~h < 03 ~ h 


P-21) 


which is constant over the spectral range. In addition, according to the inver¬ 
sion formula (D.19) it is verified that C xy (0) = (1/2/r) f*„/ h ha 2 da = a 2 . ■ 

Remark: Notice that the spectral density is often defined in the following 
slightly different way 


«M*>) = bSZ c *y^ e ~ imdr ( D - 22 ) 

Effectively, the two definitions of S xy and differ by a factor of 2k, which 
can be regarded as a difference in the definition of the Fourier transform. ■ 

Linear stationary models 

Assume that u(t) = {u*} and v(t) = {o*} are uncorrelated weakly stationary 
stochastic processes and that y(t) is related to u and v via the convolution 

y{t) = h(t) * u(t) + v(t) (D.23) 

and for discrete-time variables 

00 

yk = '£ h J Uk -J + Vk (D.24) 

j=o 

where h{r) - { hk) is the weighting ftmction. As Eq. (D.24) also can be 
interpreted as a convolution (D.23) we make no distinction between Eq. (D.23) 
and Eq. (D.24) in the case of discrete stochastic processes. 
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Let r = qh denote a multiple q of the sampling period h and assume that 
the autocovariance function of {u*} is C uu (t). The mean value of {y*} is then 
constant and independent of time, and the cross covariance function between 
the output y and the input u is 


OO 

Cyu{r) = C yu (qh) = Cov{^2 hjU k _j + v k ,u k _ q ] 

7=0 

OO 

= ^AyCovfufc-j.u*-,} + Cov{ Vk, Uk- q ) = h{t)*C uu {x) 

7=0 


(D.25) 


Hence, it follows that the output {y*} is a weakly stationary random process. 

In particular, if the input to a linear system is zero-mean white noise with 
Cuu (r) = “ o*^^ then 

C yu (r) =» <y 2 h(r) (D.26) 


where h(t) is the weighting function. 


D.3 DIFFERENCE EQUATIONS 

Consider a stochastic process with an output that depends on previous outputs 
according to the difference equation 

yk = -aiy*-i - a 2 yk -2 - a n y k - n + w k (D.27) 

or 

A(z _1 )y* = with A(z -1 ) = 1 + aiz -1 + ••• + a n z~ n (D.28) 

where {U7*} is a sequence of uncorrelated stochastic variables with £ { wk) = 0 
for all k. Stochastic models according to Eq. (D.27) and Eq. (D.28) are known 
as autoregressive models. The transfer function H{z) = 1/A(z _1 ) is stable if 
and only if the poles, i.e., the complex numbers z\,...,z n solving the equation 
A(z _1 ) = 0, are strictly inside the unit circle— i.e., |zj| < 1. The polynomial 
A(z -1 ) is called the generating polynomial of the stochastic process. 

The output sequence {y*} is a weakly stationary sequence {y*}*l-oo with 
mean 


Ay = £{y*} = £{-aiy*_i -a 2 yk -2 - a n y k _ n + u*} 

= — My( a i + • • • + a n ) 


(D.29) 
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AR-process realization 



Figure D.l Autoregressive process y* = 0.9y*_i + Wk with pole diagram, a real¬ 
ization with f E{ Wk } = 0 and *E{ u/ 2 } = cr 2 = 1, theoretical {dotted line) and empirical 
(solid line) covariance function, and amplitude spectrum. 


or 

fiyA( 1) = 0 (D.30) 

so that pi y = 0 except in the case when A(l) = 0 which corresponds to au¬ 
toregressive dynamics with integral action. Notice that this case is precluded 
inasmuch as A(z -1 ) belongs to the set of polynomials with all zeros strictly 
inside the unit circle; see Figs. D.l, D.2, and D.3. 

The covariance function for an autoregressive stochastic process satisfies the 
following difference equation (Yule-Walker equation) 


Cyy(kh) + aiCyy((k-l)h) + -- + a n Cyy((k-n)h) = 0, k = 1,2,... (D.31) 


with the initial condition 


C^ y (0) + ai C yy (h) + • • ♦ + a n C yy(nh) = <r 2 


(D.32) 
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Re z 


Covariance function 



Figure D.2 Autoregressive process yk = 
a realization with £{u/*} = 0 and £{ w\] 
empirical (solid line) covariance function, < 



..8yk -1 - 0.9y*_2 + w* with pole diagram, 
= o 2 = 1, theoretical (dotted line) and 
nd amplitude spectrum. 


Proof: By multiplying Eq. (D.27) by yk- q and calculating the expected 

values 


‘E{w k y k - q \ = ‘Elykyk-q + aijA-iy*-, + • • • + a„y*_ n y*_ 9 } 

= C yy {qh) + aiCyydq - 1 )h) + ■■■ (D.33) 

+ ■ • ' a nCyy((q — n)h)} 

where the left-hand side may be simplified to 

^{wiyi-j} = a 2 Sij (D.34) 

because w t and y f _/ are uncorrelated for j = 1,2,3,.... ■ 

The spectral density for an autoregressive process is 

S yy( l0) ) = A{e icoh )A(e~ i(ok ) a2 = |A(e i<yA )| 2<T2 


(D.35) 
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Time Frequency [Hz] 

Figure D.3 Autoregressive process y* = — 0.9y*_j + Wk with pole diagram, a real¬ 
ization with “E{wic) = 0 and r E{w\\ = a 2 — 1, theoretical (dotted line ) and empirical 
(solid line) covariance function, and amplitude spectrum. 

for \a>\ < o>n = n/h. 

Example D.3—Yule-Walker equations for a first-order process 
Consider the stochastic process with sampling interval h and with the first- 
order autoregressive dynamics 

yk = 0.9y*_i + Wk', with £{ 10 *} =0, and r E{ WiWj) = o 2 8ij (D.36) 
The Yule-Walker equation and the initial condition give 

Cyy(0) + aiCyy(-h) = <J 2 

Cyy(h ) + aiC w (0) = 0 
Cyy(kh) + ai Cyy((k ~ l) k) = 0 


(D.37) 
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Figure D.4 Moving average process y* = + 0.9u>*_i with zero diagram, a real¬ 
ization with = 0 and “L {m 2 } = a 2 = 1, theoretical (dotted line ) and empirical 

(solid line ) covariance function, and amplitude spectrum. 


The first two equations give the solution 


( Cyy(0) 'j 

= r 1 

} - 02 r 1 i 

l Cyy(h) ) 

l CL! 

lj l 0 J 1 - af [ -Oi J 


(D.38) 

The difference equation for the autocovariance function then gives the explicit 
solution 


Cyy{kh) 


and the autospectrum is 
1 


Syy(iCO) — 


1 + aie~ icoh 1 + a\e 


icoh 


1 — af 


a 2 h = 


(~aiY 


1 + a\ + 2ai cos coh 


on 


(D.39) 


(D.40) 


Moving average models 

Stochastic processes of the following type where the weights c £ - are zero for 
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Time Frequency [Hz] 


Figure D.5 Moving average process y* = u>t — 0.9ta*_i with zero diagram, a real¬ 
ization with £[u>*} = 0 and£{u»f) = a 2 = 1, theoretical (dottedline) and empirical 
(solid line) covariance function, and amplitude spectrum. 

i > m are known as moving average processes 

y k = Wk + CiWk-l + * * * + C m Wk-m ( D * 41 ) 

The transfer function that relates the output sequence {y*} to the input [wk] 
is 

H(z) = C{z~ l ) = co + c\z~ l + *** + c m z~ m ; cq = 1 (D.42) 

Notice that the terminology “moving average” is here somewhat misleading 
as there is no restriction that the coefficients should add to 1 or that the coef¬ 
ficients are non-negative. An alternative description is finite impulse response 
or all-zero filter. 

The covariance function associated with Eq. (D.41) is 
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Figure D.6 Moving average process y* = «/* - 1.8u;*_i+0.9u;*_2 with zero diagram, 
a realization with £{w/*} = 0 and = a 2 = 1, theoretical {dotted line) and 

empirical {solid line) covariance function, and amplitude spectrum. 


The spectral density is 

Syy(ico) = \C{e icok )\ 2 o"h (D.44) 

Realizations of various MA-processes are shown in Figs. D.4, D.5, and D.6. 

Example D.4— Covariance functions for MA-processes 
Consider the two MA-processes 


y k = w k + cwk-i 
x k = cw h + w k _ i 


(D.45) 


with the generating polynomials 1 + cz -1 and c +z -1 , respectively. The input 
sequence {«;*} is assumed to be zero-mean white noise. The corresponding 
covariance functions of the processes {y*} and {x*} are 

( <j 2 (1 + cf) if q = 0 

Cyy(qh) = C xx (qh) = < a 2 C\ if |<7i = 1 (D.46) 

( 0 if <7 > 2 
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Notice that the two covariance functions Cyy and C xx are equal despite the 
different generating polynomials. ■ 


D.4 AUTOREGRESSIVE MOVING AVERAGE MODELS 


Consider an autoregressive signal {jc*} and noisy observations {y*} repre¬ 
sented by the following stochastic process 


x k = -<*!**_!- a n x k . n + v k 

yk = Xk + ivk 


(D.47) 


where {u*} and { Wk) are uncorrelated white-noise sequences with £{u*} = 
£{«>*} = 0 and = of and £{u;|} = of. The spectral density of the 

noise-disturbed output y is 


Syy(i(0 ) — 


of/i 


A(e io>h )A(e~ i<oh ) 


+ of, A 


of + of A(e lC0h )A{e~ mh ) 
A{e i<ok )A(e- imh ) 


(D.48) 


Spectral estimators based on the autoregressive model tend to give very poor 
results when applied to data generated by a system with observation noise 
of the type (D.47). A reason for this sensitivity can be sought in the spectral 
density (D.48), which is clearly characterized by poles as well as zeros. It is 
obvious that the presence of zeros in Eq. (D.48) is not compatible with the 
original autoregressive model which motivates an extension of the model set 
to autoregressive moving average (ARMA) models of the type 


yk = -aiyk-i - a n yk-n + Wk + ciw k -i + • • • + c m w k - m (D.49) 


with the spectral density 


Syy(i<o) = 


Cje l " h )C(e-™) , 
A(e iah )A(e- iah ) 


(D.50) 


Spectral densities on the form Eq. (D.50) are sometimes called rational spec¬ 
tral densities (or rational spectra ) as they derive from the rational function 
C(z)/A(z). 

The covariance function may be calculated from the Yule-Walker equations 


Cyy{qh) = -aiCyy{(q - 1 )h) - a n C yy {{q - n)h) 

C yw (qh) + CiC yw {(k - 1 )h) + * * * + C m Cy W {{q - m)h) 


(D.51) 
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where C yw (qh) = Covfy^M;*..^} satisfies the relationship 

= a iCy W {{Q ~ 1)^) — — o.nCyw({Q ~ n)h) + c 9 c 

and where C yw (qh) is equal to zero for q < 0. 

Example D.5—Yule-Walker equation for an ARMA process 
Consider the first-order autoregressive moving average model 

yk +1 = -aiy k + w k+ 1 + ci w k 

The Yule-Walker equation gives 

Cyy( 0) + UlCyy(-h) = & 2 + Cl(-£il + C l)(J 2 
Cyy(h) + aiCyy(O) = C 1<J 2 

Cyy{{k - 1 )h) + Ol C yy ((k - 1)^) = 0 
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(D.52) 


(D.53) 


(D.54) 


As Cyy(-h) = Cyy(h) the first equations above give a solvable system of linear 
equations 


' 1 
, a i 


a i 
1 


' C yy (h) ) = ( Cl 

. , C yy (0) J l 1 ®lCl + Cj , 


(D.55) 


The solution to Eq. (D.55) and the recursive Yule-Walker equation provide 
the explicit solution as 


C,,(0) 1 


' 1 + cf—2aiCi 'I 

Cyy(h) 

= 

(1 -OiCi)(ci - ai) 

< Cyy(qh) J 


- (1 - a x ci)(ci - o 1 )(-ai) 9_1 , 


(D.56) 


■ 


Spectral factorization 

It was shown in Eq. (D.48) that an autoregressive signal disturbed by white 
noise gave rise to an ARMA-type rational spectral density of the type (D.50) 
with a factor C(z)/A(z) evaluated for z - e~ l(oh . In fact, the generalization 
is valid so that it is possible to find a transfer function factor H(z) for any 
rational spectral density generated by a state-space representation 


%k+i ~ Vk 

y k = Cx k + w k 


(D.57) 
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where 0 is stable, (0, C) is observable, and where the independent white- 
noise sequences {u*} and { Wk) have zero mean and covariances £„ and L w , 
respectively. This procedure is called spectral factorization and consists of 
solving the Riccati equation 

P = 0P0 r - 0PC r (CPC r + £ u ,)- 1 CP0 r + £„ (D.58) 

and evaluating the matrices 

K. •mere'+*.)■* 

X = CPC T + Z«, 

and the transfer function 

H(z) = C[zl - 0)-^ + / (D.60) 


The spectral density for \co\ < (Off = x/h is then 

S y y(ico) = (C(e ia>h - 0) -1 Z p (e _ “ a/l - 0)" 1 C r + Y. w ) 
= H(e ia,h )-LH T (e- ia,h ) 

both for Eq. (D.57) and for a stochastic process 

y k = H{z)e k 

£{e*} = 0 

T.{e k e T q ) = Z8 kq 

where {e^} is a noise sequence filtered by H(z). 


(D.61) 


(D.62) 


D.5 SAMPLE COVARIANCE FUNCTIONS AND SPECTRA 

The theoretical covariance functions and hence the corresponding spectra can 
be calculated, for instance, by means of solving the Yule-Walker equations. 
Calculations of the empirical counterparts, that is, computation of covariances 
and spectra from data, sometimes require reformulation m order to consider 
effects of finite data records. 

The sample mean x based on N samples is 
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Two standard covariance estimators based on N samples with the constant 
sampling interval h are 


C xx {qh) = — J2'\Xk - x)(x k -q - x) 

k = Q 

C' xx {qh) = ]£(** - *)(**-* - *) 


(D.64) 


The major difference between the two estimators is the normalization factor 
which takes on the values N - q and N in the two cases. In both cases it 
is required to make some correction for low-frequency trends which, at least, 
entails removal of constant sample mean levels x. 

Notice that these sample covariance functions (D.64) are not derived from ana¬ 
lytical considerations and are, indeed, chosen more because of their similarity 
to the theoretical counterpart than as a result of theoretical motivations. 


Error analysis 

Assume that ’£{x*} = 0. The mean values of the two covariance estimates 
are then 

‘E'iCxxiqh)} = *E{ pj __ g yZ x kXk-a) = ft _ q y"! ^{***£-< 7 } = C xx {qh) 

H k=q k=q 

(D.65) 

Thus, C xx (qh ) is an unbiased estimator of C xx whereas C xx (qh ) is only asymp¬ 
totically unbiased as the record length tends to infinity. 

Cov{ Cyy(qh) f Cyy(rh)} 

1 00 

E Cyy(kh)C yy ((k - q + r)h) + C yy ((k + r)h)C yy ((k - q)h) 

k = -O0 

Cov{ C xy (qh), C Xy {rh)} 

1 00 

« E C xx {kh)C yy {{k -q + r)h ) + C xy ((k + r)h)C yx ((k - q)h) 

k - —OO 

(D. 66 ) 

Notice that the approximation in Eq. (D. 66 ) refers to the case of normally 
distributed noise. It is obvious that the cross-covariance estimates at different 
times might be correlated. 
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It can be concluded that sample autocovariance functions with a normalization 
factor ( N—q) are unbiased, but their variance is larger than that of estimators 
with the normalization factor N . 


NONSTATIONARY STOCHASTIC MODELS 

Time series collected from cases of application often exhibit non-zero mean 
value and even systematic fluctuations of the mean value in the course of the 
time series. The nature of the fluctuating, non-zero mean value may be very 
diverse and may exhibit, for instance, linear trends or periodic behavior. In 
the theory of stochastic processes this is known as trends . 

Trends can often be approached analytically if conditions remain stable over 
a certain period of time. A nonstationary time series {y*] may sometimes be 
split up into a trend series { fk } and a stationary residual series {**} according 
to 

Jk = fk + *k (D.67) 

where £{rc*} = 0. It is often somewhat more difficult to suggest a decomposi¬ 
tion for trends in the variance although it is often suggested to take logarithms 
of the original time series. 

The trend can sometimes be represented by a polynomial in time of low 
degree—for instance, an offset, a linear trend, or a parabolic trend with an 
increasing trend at the beginning of the time series and a decreasing trend at 
the end, or vice versa . If the trend is periodic, it might be represented by a 
finite Fourier series. Of course, standard regression techniques can be used 
to fit such trends. 

For analytical reasons it is often desirable to remove low-frequency trends by 
subtracting some function fitted to data, by eliminating deterministic func¬ 
tions, or by means of filtering. Such trend elimination of constant offsets or 
linear trends usually presents no statistical or practical problems. 

Seasonal or periodic trends can sometimes be eliminated by means of filtenng 
the original data with the filter 

V p = 1 - z~ p , p = period of trend (D.68) 

This filter effectively eliminates purely periodic trends, but sometimes it gives 
rise to strange initial and end-point effects (see Fig. D.7). It is noteworthy that 
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Time 


Time 



Time Time 


Figure D,7 Filtering of signal (p = 12) with periodic mean value extracted from 
time series by means of a filter 1 - z~ 12 . The residual time series exhibits stationary 
characteristics. 


it is critical to compensate with the correct period of data as there is otherwise 
a risk of additional data distortion. It is for the same reason important to 
choose the sampling period such that the data period p appears as a multiple 
of the sampling period. 

ARMAX models 

An important class of nonstationary stochastic processes is where some deter¬ 
ministic response to an external input and a stationary stochastic process are 
superimposed. This is relevant, for instance, when the external input cannot 
be effectively described by some probabilistic distribution. 

The ARMA model can be extended by adding an external input { Uk } which is 
usually considered to be known 

yk = ~aiy k - 1 - a nA y k „n A + 

+ biUk-i + ■■■ + b nB u k -n B + (D.69) 

-f Wk + C\Wk-l + ' * • + C nc Wk-n c 
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In polynomial form we can express Eq. (D.69) as 

A{z- l )y k = B{z- x )u k + C(z-')w k (D.70) 


Because of linearity {y*} can be separated into one purely deterministic proc¬ 
ess {**} and one purely stochastic process {ujt} 


A(z = B(z *) u k 
A(z~ l )v k = C(z~ l )w k 


=> yk = x k + v k 


(D.71) 


The type of decomposition (D.71) which separates the deterministic and sto¬ 
chastic processes is known as the Wold decomposition. 


D.7 PREDICTION AND RECONSTRUCTION 

Consider the problem of predicting the output d steps ahead when the output 
{y*} is generated by the ARMA model 

A(z- X )y k = C{z~ x )w h (D.72) 

which is driven by a zero-mean white noise {u>*} with covariance < E{wiWj] = 
c*5ij. In other words, assuming that observations {y*} are available up to 
the present time, how should the output d steps ahead be predicted optimally? 

Assume that the polynomials A(z~ 1 ) and C(z _1 ) are mutually prime with 
no zeros for \z\ > 1. Let the C-polynomial be expanded according to the 
Diophantine equation 

C(z~ 1 ) = A(z~ 1 )F d (z~ 1 ) + z- d G d {z~ l ) (D.73) 

which is solved by the two polynomials 

F d {z~ l ) = 1 + fe~ x + •'-• + f nt z~ n ', n F = d- 1 

G d (z _1 ) = go + g\z~ l + ■■■ + gn G z~ nG , n G = max(n A - l,n c -d) 

Using Eqs. (D.72) and (D.73) we find 

‘ G(z~ x ) 

yk + d = F d (z ^Wk+a + gTj- T 


(D.75) 
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Let us by y* +< i|* denote linear d—step predictors of yk+d based upon the meas¬ 
ured information available at time k. As the term F d {z~ l )wk+d of Eq. (D.75) is 
unpredictable at time k, it is natural to suggest the following d-step predictor 

y k+ d\ k = (D.76) 

The prediction error satisfies 

. G(z~ l ) A(z~ 1 )F(z~ 1 ) + z~ d G(z~ l ) 

^k~d (^*+< 4 * yk+d) yk+d 

= -F d (z~ l )w k+d 

(D.77) 

Let £{ j^} denote the conditional mathematical expectation relative to the 
measured information available at time k . The conditional mathematical ex¬ 
pectation and the covariance of the d— step prediction relative to availaible 
information at time k is 

£{5wi* -yk+d\Tk} = ‘£{-Fd{z~ 1 )wk + d\7 r k) = 0 

*£{ (5wi* - y k+ df\U = mFd{z~ l )w k+d f\U 

= fE{(u>* + <i + fiw k+d . 1 +••• + fd- 

= (1 + fl + ■' ' + fn F ) a h = 0 

It follows that the predictor (D.76) is unbiased and that the prediction error 
only depends on future, unpredictable noise components. It is straightforward 
to show that the predictor (D.76) achieves the lower bound (D.78) and that 
the predictor (D.76) is optimal in the sense that the prediction error variance 
is minimized. 

Example D.6 —An optimal predictor for a first-order model 
Consider for the first-order ARMA model 

yk +1 = -ct\yk + Wk+I + Ciw k (D.79) 

The variance of a one-step-ahead predictor yk+i\k is 

( £{(yk + i\k- yk + i) 2 \Tk} = e F{(y k +i\k + a'iyk-ciWk) 2 \‘J r k) +F{w 2 k+ JJ*} 

= F{(yk + i|ft + aiy k - ciWkf^k] + o 2 w > o 2 w 

(D.80) 

The optimal predictor satisfying the lower bound (D.80) is obtained from Eq. 
(D.80) as 

5^+n* = -“m + ci Wk 


(D.81) 
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which, unfortunately, is not realizable as it stands because Wk is not available 
to measurement. Therefore, the noise sequence {«>*} has to be substituted 
by some function of the observed variable {y*}. A linear predictor chosen 
according to Eq. (D.76) is 


Gxiz- 1 ) 


c i — a i 
—py* 


(D.82) 


y* + ii* c(z- l ) yk i + cjz -1 ' 

Let the difference between the predictors (D.81) and (D.82) be denoted 

8k = 9k+i\k ~ y* + i (D.83) 


By direct substitution it follows that 

(1 + ciz~ l )8 k = 0, for all k (D.84) 

and so 

8 k = (-a) k 8 0 (D-85) 

for some initial value 8q. Any possible initial difference between the two esti¬ 
mators thus disappears as k grows and as the predictor y*+i|* approaches sta- 
tionarity. Hence, the linear predictor (D.82) achieves the lower bound (D.80) 
and is, consequently, optimal. ■ 


D.8 THE KALMAN FILTER 

Consider the linear state-space model 


** + i = + v k , x k e R n g6 ^ 

y k = Cx k + w k , yk e R m 

where { Vk} and { Wk] are assumed to be independent zero-mean white-noise 
processes with covariances 3L V and X w , respectively. It is assumed that {y*} 
but not [xk] is available to measurement and that it is desirable to predict 
{x&} from measurements of {y*}. 

Introduce the state predictor 


£*+i|* = ^^|A-i - Kk(yk - yk), 
yk = Cxk\k- i, 


e R n 
yk e R m 


(D.87) 
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The predictor (D.87) has the same dynamics matrix 0 as the state-space model 
(D.86) and, in addition, there is a correction term Kk(yk — yk) with a factor 
Kk to be chosen. The prediction error is 


%k +1 (D.88) 

The prediction-error dynamics is 

Xk+i = (O - KkC)xk + Vk- Kkivk (D.89) 

The mean prediction error is governed by the recursive equation 

£{*a + i} = {*> - K k C)<E{x k ) (D.90) 

The mean square error of the prediction error is governed by 

= £{((<*> - K k C)x k + v k - K k w k )((<- K k C)x k + u k - K k w k ) T } 

= (<*> - KkO'Elxk^} (O - K k C) T + £„ + KkL w K k 

If we denote 


Pk = £{**#} 

Q k = Z w + CP k C T 
then Eq. (D.91) is simplified to 

P k+1 = <PP,0 r - K k CP k 4> - ® T P k C T K£ + + K k Q k K[ 

By completing squares of terms containing K k we find 

P k+1 = <DP*<D r + - OP A C r Q^ 1 CP*d> T 

+ (K k - <t>P k C T Q~ k 1 )Q k (K k - ®P k C T Q~ k l ) T 


(D.91) 

(D.92) 


(D.93) 


(D.94) 


where only the last term depends on Kk. Minimization of P* +1 can be done 
by choosing Kk such that the positive semidefinite Kk -dependent term in Eq. 
(D.94) disappears. Thus P*. +1 achieves its lower bound for 


K k = ®P k C T (Z. w + CP k C T )- 1 

and the Kalman filter (or Kalman-Bucy filter ) takes the form 

*k+i\k = ^AlA-i ~ K k (y k - y k ) 
yk ~ Cixk\k— i 

K k =<PP k C T (L w + CP k C T )- 1 
P k+ i = <&P*O r + L„- <DP k C T (Z w + CPkC T )- 1 CP k O T 


(D.95) 


(D.96) 
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which is the optimal predictor in the sense that the mean square error (D.91) 
is minimized in each step. 

Example D.7—Kalman filter for a first-order system 

Consider the state-space model 

x^i = 0.9x* + v k ( 0 . 97 ) 

yk= Xk + w k 


where {u*} and {w k } are zero-mean white-noise processes with covariances 
tE{i»|} =1 and £{u> 2 } = 1, respectively. 

The Kalman filter (D.96) takes on the form 


Kk 

P k+ i 


0.9x*|*_i - Kk(xk\k-i - yk) 
0.9P k 


1 + Pk 


0.9 2 P k + 1 - 


0.9 2 Pf 
1 + Pk 


The result of one such realization is shown in Fig. D. 8 . 


(D.98) 
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Figure D.8 Kalman filter applied to one-step-ahead prediction of Xk+i in Eq. (D.98). 
The observed variable {y*}, the state (**} and the predicted state {?*}, the estimated 
variance {P*} and {Kk }, and the prediction error {?*} are shown in a 100-step real¬ 
ization of the stochastic process. 
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■ 







A Case Study 


E.1 INTRODUCTION 

The studyt consists of an analysis of measurements related to human pos¬ 
tural dynamics, which was investigated in six healthy subjects by means of 
a force platform recording body sway induced by vibrators attached to the 


t Based on “Identification of Human Postural Dynamics” by R. Johansson, M. Magnusson, 
M. Akesson which appeared in IEEE Transactions Biomedical Engineering Vol. 35, No. 10, 
pp. 858-869; October 1988. ©1988 IEEE 
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calf muscles. The model of body mechanics adopted was that of an inverted 
pendulum, and posture control was quantified in three variables—swiftness, 
stiffness, and damping—which were assessed by means of parametric identi¬ 
fication of a transfer function representing the stabilized inverted pendulum. 
The identification fulfills statistical validation criteria, and it is conjectured 
that the state feedback parameters identified are suitable for use in assessing 
ability to maintain posture. 


E.2 SUMMARY 

Human posture control is maintained by proprioceptive, vestibular, and vi¬ 
sual feedback, integrated within the central vestibular and locomotor system. 
Lesions to the sensory feedback system, or to the central nervous system, 
may im pair postural control and equilibrium. It is therefore of interest to 
assess the ability of postural control by measuring the displacement of the 
body center of gravity. Recordings of the amplitude and frequency of sponta¬ 
neous oscillations around the equilibrium position may describe the sway and 
thus, by extension, the control of posture. Normally, spontaneous oscillation 
appears in healthy individuals during stance, and the oscillating behavior of 
the body sway is often irregular or complex. Another problem is to analyze 
response to an external disturbance in the presence of spontaneous motion. 
To understand the biological correlates of the posture control variables, it is 
also desirable to make a model-based analysis of the control system. 

For the present study, we developed a model for posture control based on expo¬ 
sure of the subject to erroneous proprioceptive input. The stimulus is produced 
by vibration of the calf muscles, which results in activation of muscle spin¬ 
dles (see References [E7], [E12]). Vibration is believed to activate the muscle 
spindles, as occurs during passive muscle stretch, which causes a reflex con¬ 
traction. In the present experiments, the stimulus used is vibration of the calf 
muscles. Body sway is measured with a force platform. The model adopted is 
that of the standing human body as an inverted pendulum, equipped with a 
servo-mechanism for balance. The model is designed so that the spectral anal¬ 
ysis is compatible with a dynamic systems approach, and Laplace transform 
methods are used for transient input-output analysis. Parametric estimation 
is done with maximum-likelihood estimation of coefficients in ARMAX models. 
Model fitness and parameter uncertainty are analyzed statistically. The aim 
of the present study is to identify feedback parameters useful in evaluating 
ability to maintain posture control. 
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E.3 METHODS AND MATERIALS 

Tests were done on naive human subjects, three males and three females 
(mean age 28; range 23-39 years), none of whom had any history of vertigo, 
central nervous disorder, ear disease, or injury to the lower extremities. At 
investigation, no subject was on any form of medication or had consumed 
alcoholic beverages for at least 48 hours. 

Equipment and experimental setup 

The equipment consisted of a square force platform connected to a computer 
for data recording and computation. The platform is equipped with strain 
gauges to measure vertical force at each comer at four symmetrically located 
points. Measurements obtained from the strain gauges are recorded by the 
computer, and represent the differential distribution of forces exerted by the 
feet on the platform. The equipment allows simultaneous recording of body 
sway both in the sagittal and frontal planes— i.e., longitudinal and lateral 
motion. The stimulus is produced by vibration of the calf muscles at fre¬ 
quencies of 60 Hz and 100 Hz and of 0.4 mm amplitude. The subject stood 
with heels together on the platform while staring at a spot on the opposite 
wall. A small vibrator was attached to the calf muscle in each leg with elastic 
straps. The subject stood erect but not at attention either with closed or open 
eyes as instructed, and the recording was started. First, spontaneous sway 
was recorded. Then, the vibrators were turned on/off and modulated pseu- 
dorandomly (PRBS) according to a program executed in the computer while 
recording continued. 

The frequency of the vibrators depended linearly on the input voltage v which 
had been checked for all vibrators before use. As part of routine laboratory 
practice, it was verified that there was no interference (aliasing) between the 
sampling frequency and the vibration frequency. The test sequence took 180 
s. 


E.4 MODELING OF POSTURE CONTROL 

When exposed to a saggittal perturbation a subject may regain equilibrium 
by two different strategies: “ankle strategy,” in which muscular forces rotate 
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Figure E.l Inverted pendulum model of human postural dynamics with the bal¬ 
ancing torque T^i similar to that achievable with a spring ( k ) and a dashpot ( 77 ). 


the body around the ankle joint; or “hip strategy,” involving flexion at the hips 
and knees. The “ankle strategy” is sufficient to counteract minor perturbations 
that occur during natural stance, and it fits the model of posture control as an 
inverted pendulum. Hip strategy has to be employed when large correction 
forces are needed. In hip strategy there is a potential problem with shear 
forces against the supporting surface. However, the force platform has been 
constructed so that shear forces do not interfere with the recorded signal. 
The moment of inertia may also change in pronounced movements because 
the center of body mass will be lowered. Thus, it is arguable that where 
gross compensatory movements are concerned (e.g., preventing the subject’s 
imminent fall), the inverted pendulum model may be insufficient. However, 
in natural stance and in the minor perturbations induced by the vibratory 
stimulus used here, the inverted pendulum model is fully adequate to account 
for the corrective movements used to control body posture. 


The model is formulated for dynamics in the sagittal plane with the body con¬ 
ceived of as an inverted pendulum. The inverted pendulum has an unstable 
equilibrium point at 6 = 0 (see Fig. E.l) which means that active stabilizing 
forces must compensate for deviations in position in order to maintain pos¬ 
ture. The balancing forces exerted are the result of a complex event invoking 
all body muscles acting in concert. A model of balance as a servo mechanism 
need not, however, be more complicated than suffices to describe the resulting 
behavior as reflected in the measurements. 
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The model, then, consists of an inverted pendulum to explain the pure body 
mechanics and a balance control system which acts like the shock absorber 
of a motor vehicle. The “suspension” is characterized by a spring constant k 
and a damping tj, which keep the body in an upright position and capable 
of counteracting disturbance. The response to an impulse is determined by 
the values of k and 7 ], as well as m (body weight) and l (distance of the body 
center of mass from the platform surface). The following assumptions are 
made in order to formalize and simplify analysis. 

Assumption 1: The body is stiff, and has a mass m [kg]. 

Assumption 2: The body center of mass is located at distance l [m] from 
the platform surface. 

Assumption 3: There is a dynamic equilibrium between the torque of the 
foot and the forces acting on the “pendulum.” 

A person who does not counteract the forces of gravity may be modeled by the 
force equilibrium of an inverted pendulum. Introduce J as the body moment of 
inertia around the ankle, and the tangential torque equilibrium for a standing 
person subject to gravitation g is then 

J2a 

- rnglsm6(t), J = ml 2 (E.l) 

at" 

It is easy to understand both mathematically and intuitively that there is no 
stable equilibrium at 6 = 0. A person who does not counteract the gravita¬ 
tional torque with a stabilizing response will inevitably fall. The following 
two assumptions are introduced to model balance action and the effect of dis¬ 
turbances from the environment. 

Assumption 4: Assume that there is a stabilizing ankle torque, Tj a ;(f). 

Assumption 5: Assume that there is a disturbance torque, Td{t), from the 
environment. 

The torque balance now has the form 

<7-^2 = mgl sin 6(t) + T 6a /(£) + Td(t); J = ml 2 (E.2) 

We assume that PID-control (proportional, integrating, derivative) via the 
ankle torque Thai is sufficient to represent the nature of the stabilizing control. 

Assumption 6: Assume that Thai stabilizes the posture with PID-control 
with the components P, I, D determined by coefficients k, T], 
and p. 
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Figure E.2 A sketch of a force platform used for postural measurements. The 
variable T x and T y denote the torques around the x- and y— directions, respectively. 


P: —mgl sin 6{t) - kJ9(t) 

I: -pJ f‘ g 6{t)dt 
D: -TjJ0(t) 

PID-control is chosen here because the proportional, derivative, and integral 
actions are fundamental modes of control. The components (P,D), and that 
k, rj > 0 are indispensable for stability according to the Routh criterion of 
stability. The integral component (I) accounts for (slow) compensation of bias 
in 6; as (I) is not a priori necessary for stability, one of the aims of the experi¬ 
ment is to show its presence. The parameter k may be interpreted as a spring 
constant, and 77 might be compared with a viscous damping as obtained with 
a dashpot. The parameter p may be interpreted as a constant for the slow 
reset action in the control system. 

Finally, it is necessary to model the effect of the vibration stimulus. 

Assumption 7: The vibration v introduces erroneous input into the stabiliz¬ 
ing system, causing misperception of the position 9 (stretch) 
and the angular velocity 6 (rate) so that the P,D actions of 
feedback system are modified to 
P: -mgl sin 9{t) - kJ6{t) + b\v{t ) 

D: -T]J9(t) + b2v(t) 

where it is assumed that v disturbs both stretch and rate perception but at 
different proportions, b 1 and 62 , respectively. 
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A transfer function 

The torque equilibrium of Eq. (E.2) and Assumption A 6 give the two equations 


J 


d 2 6 

dt 2 


Tbal(t) = 


mgl sin 0 (i) + T- oa i{t) + Td(t) 

-mgl sin 6(t) - kJO{t) - ijJ6{t) -pJ f G{t)dt 

Jto 


(E.3) 


According to Eq. (E.3) there are three states that affect motion, namely angu¬ 
lar velocity dd/dt, angular position 6, and the bias compensation. A transfer 
function from vibration stimulus V(s) = (u(f)} and disturbance Td to the 
torque Tj a / is found via Eqs. (E. 2 ), (E.3) (see Appendix E.l). 


^6a/(®) — 


(6i + 6 2 )(s 3 -fs) Tr/ ^ _ ?7s 2 + (fe + f )j±P T ( ) 

S 3 + T]S 2 + ks + p ' ^ S 3 + TJS 2 + ks + p 


(E.4) 


It is of interest here to estimate the indispensable positive coefficients k and 
77 , and to decide from data whether there is any integral action. 


E.5 FORCES ON THE PLATFORM 

Before signal processing may proceed, it is necessary to establish the rela¬ 
tionship of the measurement signal p to the angular position 6, and a static 
force equilibrium argument would go as follows: A signal which represents 
the center of force on the force platform is measured. With static equilibrium 
between the force on the platform and the body weight, it follows that the 
force center also represents the projection on the platform of the body center 
of gravity; see Fig. E. 2 . 

However, such a model is not entirely satisfactory for the purposes of dynamic 
analysis, as the force center and the vertical projection of center of body mass 
do not generally coincide at the same point. The foot may, for example, exert a 
corrective force on the platform to initiate an angular acceleration of the body. 
As described in Appendix E.l it holds that the measurement " is related to 
the torque T^ai for a certain body mass m so that 

m = + (B - 5) 

for positions a and b, with a gain factor y. This means that the measurement 
p represents the ankle torque Thai except for a gain factor and a bias term. 
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It is part of signal processing to compensate for the gain factor and the bias 
term in the recorded measurements. 


E.6 A DYNAMIC RESPONSE CLASSIFICATION 


We have given one interpretation of the coefficients in terms of a mechanical 
model with a spring k and a dashpot effect rj. Naturally, a more rapid reflex 
system requires a balanced increase both of spring action and damping action. 
It is therefore desirable to quantify mutually independent characteristics of 
motion. Normalization of the transfer function (E.4) with respect to frequency 
gives for the stimulus dependence 


Tbal(s) 


(*i + &*)((£) 




)) 


( 


0)q 


) 3 +£( 


0 )0 ^ O>0 


) 2 + i( 


j-) + i 

0)q * 


V (s), o) Q = ffp (E.6) 


A more functional characterization of the motion based on the transfer func¬ 
tion properties may therefore be formulated using the concepts 

° Swiftness: <y 0 = f/p [rad/s] 

° Stiffness: kj(o\ 

° Damping: T]/a> 0 

This classification describes the posture dynamics by one swiftness parameter 
and two stability parameters. The swiftness parameter is a bandwidth [rad/s] 
and provides information about the highest angular frequency of the distur¬ 
bance for which the posture control system gives adequate correction. The 
stiffness and damping are dimensionless stability parameters, independent of 
posture control swiftness because the dependence on (o 0 is eliminated. A high 
value of swiftness means rapid response to disturbance, i.e., rapid compensa¬ 
tion for small deviations from equilibrium. A high value of damping means 
good damping of sway velocity. 


E.7 EXPERIMENTS 

Experiments were performed with six subjects to evaluate the model and the 
method. The first experiment tested the difference between performance with 
open eyes and that with closed eyes. Other experiments were performed to 
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test the difference between two choices of stimulation frequency of the method 
by using asymmetric stimulation. Time and frequency domain properties of 
the stimulus are presented in the results section. The following recordings 
were made with a sampling interval of 0.04 s, i.e ., the sampling frequency 25 
Hz. 

Experiment A: The empty experiment to measure electronic offsets. 

Experiment B: A test sequence with a vibration stimulus of 100 Hz that 
is switched on and off according to pseudorandom binary 
sequence (PRBS), the subject standing with open eyes; see 
Fig. E3. 

Experiment C: A test sequence with a vibration stimulus of 100 Hz that is 
switched on and off according to a PRBS, the subject stand¬ 
ing with closed eyes; see Fig. E.4. 

Experiment D: A test sequence with a vibration stimulus of 60 Hz that is 
switched on and off according to a pseudorandom binary se¬ 
quence, the subject standing with closed eyes. 


Data analysis was performed in the following order: 

o Autospectrum of 

— Stimulus v (vibration) 

— Response fi (force distribution in direction x) 
o Cross spectrum between v and n 
o Coherence between v and fi 

o Transfer function from v to fi computed from spectra 
o Maximum-likelihood identification of an ARMAX model 

o Validation by test of residuals 
— Changes of signs (j 2 -test) 

— Autocorrelation (% 2 -test) 

— Cross correlation between v and residuals (^ 2 -test) 

— Normal distribution of residuals 

° Validation by simulation 

o Translation from ARMAX model to continuous-time transfer function 
Examples of responses are shown in Figs. E.3-E.7. 
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Time [s] 


Time [s] 



Time [s] Output x 


Figure E.3 Experiment B (open eyes, 100 Hz). Input voltage to vibrators versus 
time. Longitudinal response y and lateral response x versus time. 


E.8 RESULTS OF THE EXPERIMENTS 

Coherence between stimulus and response was tested for the different experi¬ 
ments. A detailed presentation of calculations and numerical results is given 
in Appendix E.2. It was found that coherence was lower for all experiments 
with open eyes ( B ) than with closed eyes. Response of frontal sway was also 
shown to be low for all subjects. Thus, computations of transfer functions 
based on such data are not to be recommended. 

The results with closed eyes and symmetric stimulation were quite convinc¬ 
ing, with good coherence between vibration stimulus v and body sway in the 
sagittal plane, i.e., longitudinal motion. This fact indicates that there is a 
reasonable response to vibration at least in the absence of visual input. The 
continuous-time pole polynomial (transfer function denominator) of Eq. (E.6) 
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Time [s] 


Time [s] 



Time [s] 


Output x 


Figure E.4 Experiment C (closed eyes, 100 Hz). Input voltage to vibrators versus 
time. Longitudinal response y and lateral repsonse x versus time. 


was computed. The third-order model pole polynomial 

A(s) = s z + 7]s 2 + ks + p 


(E.7) 


was fitted with data from Experiments C and D. Results according to Table 
E.l were obtained from Experiment C (closed eyes, 100 Hz). 

These parameters characterize a very well damped regulation system. The 
dynamic response classification describes posture dynamics by one swiftness 
parameter and two stability parameters (see above). 

The results cf experiments are listed with comments on good (+) or poor (-) 
properties of the present approach in estimating ability to maintain posture 
control. The arguments for these conclusions are given in the previous section 
and in Appendix E.2. 

+ There is acceptably strong coherence in sagittal plane motion with closed 
eyes. The power of the oscillation increases by a factor of two, which 
means that there is a reasonable response to the vibration stimulus. 
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Frequency [Hz] Frequency [Hz] 


Figure E.5 Experiment B (closed eyes, 100 Hz). Vibrator input voltage power 
spectrum (1) versus frequency [rad/s]. Longitudinal sway power spectrum (2) lateral 
sway power spectrum (3) versus angular frequency [rad/s]. 

+ There is weak coherence with open eyes. 

+ There is weak coherence to sway in the frontal plane. 

+ The data fit very well to a linear model. 

+ It is possible to identify the feedback parameters with good accuracy. 

+ The residual signal has a small oscillative component of 0.2—0.3 Hz which 
may correspond to breathing. 

- The method is sensitive to assymmetry in stimulation. 


E.9 DISCUSSION 

The identified coefficients k, T), and p of Assumption A6, represent different 
aspects of the posture control system. The amplitude of body sway may become 
large for a small k, whereas a large k gives good postural control of the angular 
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AO A 





Frequency [Hz] Frequency [Hz] 


Figure E.6 Coherence spectra between input (100 Hz) and longitudinal body sway 
in the case of open eyes (upper left) and closed eyes (upper right). Coherence spec¬ 
trum between input (60 Hz) and longitudinal body sway (lower left) and coherence 
spectrum between input (100 Hz) and lateral body sway (lower right). All spectra 
versus frequency Hz. 


position. The parameter tj represents the damping of body sway. Too small 
an tj value means low damping of body sway whereas a large value means 
rigidity. The parameter p represents the automatic reset, i.e. , compensative 
action to eliminate bias in the angular position. 

With a combination of the parameters k, tj, and p, a large variety of body 
sway patterns can be described. The proportional and derivative actions repre¬ 
sented by the parameters k and tj are indispensable to maintain stability. The 
third-order model is statistically validated; it is accurate and explains data 
well. The strong cross covariance of the estimates of k and p constitutes a 
practical difficulty, however. 

We have given one interpretation of the coefficients in terms of a mechanical 
model with a spring effect k and a dashpot effect tj . The integral component 
of Eq. (E.3) and Assumption A6 is responsible for a slow reset action (see Ref¬ 
erence [E2]), an action that is biologically feasible considering the anatomical 
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Figure E.7 Transfer function from spectra for experiments B,C with open eyes 
(solid line) and closed eyes (dashed line) t respectively. 


and physiological background. Vestibular, visual, and somatosensory informa¬ 
tion reaches the spinal motoneurons from the vestibular nucleus via several 
vestibulospinal and reticulospinal tracts with or without modulation in the 
cerebellum (see Reference [E10]). Spinal motoneurons are also influenced by 
interneurons with information from antagonistic muscles. An induced pertur¬ 
bation changes the visual, vestibular, and somatosensory inputs which affect 
the spinal motoneurons at different latencies (see Reference [E10]). Naturally, 
a more rapid reflex system requires a balanced increase both of spring and 
damping action. It is therefore desirable to quantify mutually independent 
characteristics of motion. A more functional characterization of the motion 
based on the transfer function properties may be formulated via normalized 
parameters by means of the concepts of swiftness, stiffness, and damping. A 
high value of swiftness means rapid response to disturbances of equilibrium, 
and a high value of stiffness means small deviations from equilibrium. A high 
value of damping means good attenuation of sway velocity. 

With the model presented here, the effect of vibration on muscle stretch per- 
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Table E.X Results of parametric identification with data from experiment C with 
100 Hz vibration and closed eyes _ 


Subject 

n 

k 

P 

1 

5.24 

44.26 

45.64 

2 

5.18 

26.11 

14.32 

3 

1.37 

19.26 

1.75 

4 

6.00 

33.16 

26.15 

5 

4.56 

26.35 

17.13 

6 

6.53 

60.53 

99.01 


Subject 

Swiftness 

Stiffness 

Damping 

1 

3.57 

3.47 

1.47 

2 

2.43 

4.43 

2.13 

3 

1.21 

13.3 

1.14 

4 

2.97 

3.76 

2.02 

5 

2.58 

3.97 

1.77 

6 

4.63 

2.83 

1.41 


caption cannot be distinguished from that on rate perception. The use of 
coherence functions makes it possible to quantify the relative importance of 
visual feedback vis-a-vis vestibular and proprioceptive feedback in different 
frequency ranges. 

The choice of suitable experimental conditions for future clinical development 
is not self-evident although those applied here have proved reasonably sat¬ 
isfactory. The following aspects deserve further consideration: test duration, 
vibration amplitude (intensity), vibration frequency vibration pattern, and 
the possibility of stimulating other muscle groups. A shorter test duration 
may be preferable for clinical purposes, as might different choices of vibration 
amplitude and frequency. The amplitude must be chosen so that the vibration 
stimulus does not induce any excessive sway or falling reactions. Thus appli¬ 
cation of the method is naturally limited to patients who are able to tolerate 
additional loading of their postural control. Other choices of frequency and 
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amplitude may give different but statistically acceptable results for each test¬ 
ing condition, though standardization of test conditions is, of course, necessary 
to permit comparison of results. 

The stimulus in the present experiments is produced by vibration of the calf 
muscles, sway being recorded in the sagittal and frontal planes for longitudi¬ 
nal and lateral motion, respectively. The vibration caused only insignificant 
motion in the frontal plane. Vibration to other muscles may provide tests 
which induce two-dimensional motion. The choice of a pseudo-random vi¬ 
bration pattern with a flat power spectrum is not dictated by methodological 
considerations, though there is a certain advantage in having a stimulus that 
is unpredictable by the subject. 

A future clinical application of the present approach is in patients where de¬ 
fect postural control may be suspected. A large test material of both normals 
and subjects with well-defined lesions has to be analyzed to determine the reli¬ 
ability, sensitivity, and discriminatory power of the parameters. However, the 
coherence function is sufficiently large for good reproducibility to be expected. 

Conclusions are only drawn from the coefficient values of the denominator 
polynomial, which means that attention is focused on effects of recovery from 
a perturbation, rather than on the onset of perturbation. This is important, 
because the stimulus intensity may vary, and there may be substantial in¬ 
terindividual variation in the primary effect of perturbations. The parameters 
of swiftness, stiffness, and damping presented here may therefore prove useful 
for interindividual comparison both in clinical practice and in research. 


E.10 CONCLUSIONS 

A postural test involving a force platform has been analyzed quantitatively 
by means of a new method. The proposed model-oriented transfer function 
approach also allows angular position 6 (or displacement of the body center 
of gravity) as well as sway velocity to be computed from the measurements 
recorded with the force platform. Parameters to quantify the body’s ability 
to maintain posture have been proposed, and the following conclusions are 
made. 

° The ankle torque Thai represents the body’s feedback control to maintain 
stability. It is emphasized that the force platform measurement may best 
be understood as the feedback actuated by the body. 




488 


Case study Appendix E 


o A quantitative analysis of the feedback properties of posture control is 
made. The control action is analyzed with classical control concepts. It 
is shown that there is corrective action with respect to angular position 
9, angular velocity 0, and a slow reset control of bias in 9 . 

o The results of computation show that the proposed quantifiers of posture 

k , rj, and p may be estimated with good accuracy according to generally 
accepted statistical validation criteria. 

o The model complexity is chosen as a linear system of order 3, which is 
sufficient to explain the outcome of measurements. 

o The method is sensitive to symmetry of stimulation. 

o The proposed model is compatible with earlier attempts to represent 
measurements of the posture dynamics by spectral analysis (see Refer¬ 
ence [E5]). Spectral analysis supported by parametric identification is 
advantageous because it allows quantitative statistical analysis as well 
as physiological interpretation. 

o The approach with parametric identification of a transfer function be¬ 
tween stimulus and response can be made with higher confidence than 
can parametric analysis of spontaneous motion. The coherence function 
gives a measure of the dependence of the response on variations in the 
stimulus. 


APPENDIX E.1 — TRANSFER FUNCTION 

The torque equilibrium of Eq. (E.2) and Assumption A6 give the two equations 

sin6{t) + T ba i{t) + T d {t ) 
at* 

Tbai(t) = -mglsme(t)-kJd(t)-T]J9(t)-pJ ! 0{t)dt 

J to 


Elimination of l 'bai gives 

-pj [ e{r)dr + T d (t) 
dt z at Jt 0 


There are three states that affect motion, namely angular velocity d6/dt , 
angular position 9 , and the bias compensation. A Laplace transformation and 
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6(s) = 


S 3 + TJS 2 + ks + p 


T d (s) 


(E.8) 


With a vibration stimulus v(t), according to Assumption A7 there is one more 
transfer function, namely that from stimulus v to 8 


s 3 + tjs 2 + ks + p + s3 + jj s 2 + ks + p 


7 s 


Td(s) 


(E.9) 


A reduced model without any integrating compensation (p = 0) gives the 
simplification 


6(s) = 


S 2 + J]S + k 


V(s) + 


1 

j 

S 2 + T]S + k 


T d (s) 


A transfer function from vibration stimulus V(s) = L{v{t)} and disturbance 
T d to the torque T ba[ is found via Eqs. (E.2) and (E.8) for linearized motion 
around the equilibrium 6 = 0, where sin 6 ~ 6 and 


T b ai(s) « (Js 2 - mgl)6{s) - T d {s ) 
. + fs) 

S 3 + Tjs 2 + ks + p 


S 3 + TJS 2 + ks + p 


(E.10) 


It is of interest here to estimate the indispensable positive coefficients k and 
77 , and to decide from data if there is any integral action. * 


APPENDIX E.2 — FORCE BALANCES 

The distances a and b denote horizontal distances from the ankle point of 
rotation to each one of the support points at the edges of the force plate. Let 
Pfoot denote the pressure of the soles exerted on the force plate, and £2 denote 
the area of contact between the feet and the force platform (see Figs. E.8 and 
E.9). The forces F a and F b represent the support forces at the edges of the 
force plate. The measurements p are force differences given by 


P = y{F a ~ F b ) 


(E.11) 
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Figure E.8 Anterior force F a and posterior force Ft on the force plate. 



Figure E.9 Sole pressure Pfoot on the area Cl of the force plate in the rz-plane. 

with y as a gain factor due to strain gauges and the electronics. The force 
equilibria related to the foot pressure Pfoot on the support surface are thus 


and 


J1 J P foot( x > y)dxdy = mg (E.12) 


u 


Pfoot(x,y)dxdy = F a + F b 


(E.13) 
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on the body and on the force plate, respectively. The corresponding torque 
equilibria are 


Thai = Ty = P foot (x,y)xdxdy 

Tx = In I Pfoot ^ x,y ^ ydxdy 


(E.14) 


The forces F a and Fb act on the distances a and b from the origin, with the 
resulting torque 

-F a a + F b b + T bal = 0 (E.15) 


where the ankle torque equilibrium results in body sway given by 


= mgls\n6(t) + T ba i{t), J = ml 2 (E.16) 

From Eq. (E.ll) and Eq. (E.15), we find for body mass m [kg] that 

F a + F b = mg, and y(F a - F b ) = g (E.17) 

Solving these equations with respect to F a and F b gives 


F a = \mg + and F b = i mg - (E.18) 

With F a and F b it is possible to express the torque T ba i as 

Thai = aF b - bF a = ^mg + ^-g (E.19) 

Sohing for fi shows that fi represents T^ a i via the linear relation 

g = ^ y Tbal + r mgt^ (E.20) 

a + b a + b 

Calibration experiments give the values a + b =0.327 [m] and y =0.044 [V/N]. 


APPENDIX E.3 — CALCULATIONS AND ANALYSIS 

The results of computation are presented in this section together with certain 
conclusions. The presentation essentially follows the order of computation 
given in Section E.6 and capital letters (A-D) refer to experiments presented 
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Figure E.10 Final prediction error (FPE) for various model orders (upper left) for 
models describing behavior with closed eyes (x) and open eyes (o). Shown are the 
variance ratio between residuals and output (upper right), simulated output and 
output {lower left), and model error and output {lower right). All graphs versus 
model order. 


above. Graphical presentation of experimental results is given in Fig. E.3 
and Fig. E.4 for open and closed eyes, respectively. It is noticeable from 
these graphs that the lateral response to vibration stimulus is much smaller 
in amplitude than is the longitudinal response. 


Spectral analysis 

The autospectra (power spectra) show the frequency contents of the signals 
investigated (see Fig. E.5). Notice that the spectrum should not be confused 
with the vibration frequency of the stimulus. 

A coherence spectrum between input and recorded response variables was 
made (see Fig. E.6). Recall that a coherence spectrum can be interpreted as 
a correlation analysis made for each frequency. A large absolute value close 
to 1 indicates that the input and the output are correlated. A coherence of 
0.5 denotes that half of the output variation may be explained by variations 
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Autocorrelation n=l 



Autocorrelation n=2 



Autocorrelation n=3 



Autocorrelation n=4 



Figure E.ll Test of autocorrelation of residuals for model orders n = 1,...,4. 
Confidence interval (99%) is displayed. 


in the stimulus input. It may be concluded from the coherence spectra that 
the coherence is quite satisfactory. The coherence is better for longitudinal 
sway than for lateral sway. This makes sense as the vibrators are mounted 
on both calves to stimulate longitudinal motion. Also notice in Fig. E.6 that 
the coherence is low for frequencies below 0.05 Hz. This may indicate that the 
reflex reaction affected by the vibration stimulus operates with a bandwidth 
down to 0.05 Hz. 


Transfer functions from spectra 

Division of the cross spectrum between input v and output x by autospec¬ 
trum of v gives the transfer function, i.e., gain and phase lag for a range of 
frequencies (see Fig. E.7). 

Estimation of the delay time Td in the feedback loop is possible by checking 
the phase lag for high frequencies. A pure delay appears in a transfer function 
as the factor exp(—sTy, which tends to dominate the phase delay in the high 
frequency range. Hence, for high angular frequencies co it holds that the phase 
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Figure E.12 Test of cross correlation of residuals for model orders n = 1,.... 4 with 
support for the choice of model order n = 3. Confidence interval (99%) is displayed. 

lag is coTd, and from a transfer function estimate (see Fig. E.7) we find, for 
instance, that at the frequency 10 Hz (co = 63 rad/s) we have a phase lag of 
600° so that we can approximate 


T d 


tz 6 00 1 
180 63 


0.17s 


(E.21) 


This value should be compared to other measures of the time required for a 
signal to complete a round-trip in the neurological circuit. 

There is some evidence that the muscle spindles react in different ways to 
different vibration frequencies (see Reference [E9]), though this is not a pre¬ 
dominant feature. Different numerical results may, however, be expected. 


Maximum-likelihood identification 

The time delay was estimated to T d « 0.17 s and it is therefore desirable 
to estimate model parameters at sampling rates of this order of magnitude. 
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The following ARMAX models all have a sampling interval of 0.20 s which is 
obtained by extraction of every fifth sample from the original time series. 

Parameter identification with estimation of initial values was made for model 
orders one, two, three, and four. Statistical tests are satisfied for orders n = 3 
and n = 4 but not for the second-order model. The Akaike test criteria AIC 
and FPE do not change much but do indicate n = 3 as the appropriate model 
order (see Fig. E.10). The model order of choice is therefore a third-order 
model. 

Validation by test of residuals 

The purpose of residual tests is to find remaining correlations which indicate 
whether the model order is adequate. With an adequate model order, the 
residual noise is white noise only and of sufficiently small magnitude. The ra¬ 
tio between the residual variance and the output variance is shown for model 
orders n = 1,...,10 in Fig. E.10. The residual tests for a third-order 
model give significant (95% confidence) validation with respect to changes of 
sign, independence of residuals, normality, and independence between resid¬ 
uals and input (see Fig. E.ll and Fig. E.12). 

Validation by simulation 

Real and simulated data have been compared using the vibration signal as a 
deterministic input (see Fig. E.13). We studied to what extent the experiment 
data are explained by the deterministic input-output behavior of the estimated 
model. 

Conversion to continuous-time parameters 

For the third-order model, we have estimated an ARMAX pole polynomial 

A(z) = z 3 + a\z 2 + 022 + a 3 (E.22) 

with the following result of parameter values and standard deviations for 
subject no. 6 


a y = -1.227 ±0.106, o 2 = 0.734 ±0.154, o 3 = -0.370 ± 0.073 (E.23) 

where the standard deviations have been calculated from the estimated co- 
variance matrix for the parameter estimates. Conversion to continuous-time 
parameters requires inverse sampling 


jrlogO 


(E.24) 
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Figure E.13 Longitudinal sway of model with input and output of experiment C 
(closed eyes, 100 Hz) in upper graphs. Simulated output from third order estimated 
model {lower left) and residual sequence {lower right). 

with a sampling interval h and a dynamics matrix O. Computation of the 
continuous-time characteristic polynomial gives 

A(s) = s 3 + 4.97s 2 + 49.4s + 32.0 (E.25) 

This formulation allows identification of the physiological feedback parame¬ 
ters in terms of the third-order model pole polynomial of Eq. (E.22) where 
the coefficients of A(s) determine the postural behavior. 

A(s) = s 3 + tjs 2 + ks + p (E.26) 

Parameters are already normalized in the model with respect to body weight 
m and body height l, in terms of the moment of inertia J with results for the 
test group according to Table E.l and Table E. 2 . We have given one interpre¬ 
tation of the coefficients in terms of a mechanical model with a spring ( k ) and 
a dashpot component ( 77 ). The more functional characterization of the motion 
in terms of swiftness {tfp), stiffness (k/(^fp) 2 ) and damping p/j/p, is based 
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Figure E.14 Longitudinal sway. Simulated impulse response and step response of 
force on the plate from third order estimated model for open eyes (solid line) and 
closed eyes {dashed line), respectively. 


on the dynamic response. This classification describes the postural dynamics 
by one swiftness parameter and two stability parameters. A high value of 
swiftness means rapid response to disturbance, and a high value of stability 
means small deviations from equilibrium. The swiftness parameter is a band¬ 
width [rad/s] and provides information about the highest angular frequency 
of disturbance for which the posture control system gives adequate correction. 
Other useful representations are impulse response and step response (see Fig. 
E.14) and zero-pole diagrams and transfer functions (see Fig. E.15). 
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Figure E.15 Transfer functions for stimulus-response relationships for open eyes 
(solid line) and closed eyes (dashed line) as obtained from third-order ARMAX mod¬ 
els. The zero-pole diagrams for open eyes (upper) and closed eyes (lower) are shown. 

The symbols ‘x’ and ‘o’ denote poles and zeros whereas denotes zeros of the noise 
spectrum. 

Some important references on dynamic posturography are 

[E2] H.C. Diener, J. Dichgans, B. Guschbauer and M. Bacher, “Role of vi¬ 
sual and static vestibular influences on dynamic posture control.” Hu¬ 
man Neurobiology, Vol. 5, 1986, pp. 105-113. 

[E3] N.G. Henriksson, G. Johansson, L.G. Olsson, and H. Ostlund, “Elec¬ 
trical analysis of the Rhomberg test.” Acta Otolaryngol. Suppl., Vol. 
224 ; 1967, pp. 272-279. 

[E4] A. Ishida and S. Miyazaki, “Maximum likelihood identification of a pos¬ 
ture control system.” IEEE Trans. Biomedical Engineering, Vol. BME- 
34, 1987, pp. 1-5. 

[E5] H. Ostlund, (Ed.), “A study of aim and strategy of stability control in 
quasistationary standing.” Report from Dept, of Neurology, Dept, of 
Research, S:t Lars Sjukhus, Lund, Sweden, 1979. 







499 


Appendix E Case study 

Table E.2 Results of parametric identification with data from Experiment D with 
60 Hz vibration and closed eyes 


Subject 

V 

k 

P 

1 

6.09 

49.25 

18.67 

2 

4.46 

43.99 

10.46 

3 

3.64 

32.15 

14.85 

4 

2.90 

10.44 

4.39 

5 

6.89 

47.79 

28.68 

6 

4.97 

49.45 

31.99 


Subject 

Swiftness 

Stiffness 

Damping 

1 

2.65 

7.00 

2.29 

2 

2.19 

9.20 

2.04 

3 

2.46 

5.32 

1.48 

4 

1.64 

3.90 

1.77 

5 

3.06 

5.10 

2.25 
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3.17 

4.91 

1.57 
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